path toward faster partition pruning
I've been working on implementing a way to perform plan-time
partition-pruning that is hopefully faster than the current method of
using constraint exclusion to prune each of the potentially many
partitions one-by-one. It's not fully cooked yet though.
Meanwhile, I thought I'd share a couple of patches that implement some
restructuring of the planner code related to partitioned table inheritance
planning that I think would be helpful. They are to be applied on top of
the patches being discussed at [1]/messages/by-id/befd7ec9-8f4c-6928-d330-ab05dbf860bf@lab.ntt.co.jp. Note that these patches themselves
don't implement the actual code that replaces constraint exclusion as a
method of performing partition pruning. I will share that patch after
debugging it some more.
The main design goal of the patches I'm sharing here now is to defer the
locking and opening of leaf partitions in a given partition tree to a
point after set_append_rel_size() is called on the root partitioned table.
Currently, AFAICS, we need to lock and open the child tables in
expand_inherited_rtentry() only to set the translated_vars field in
AppendRelInfo that we create for the child. ISTM, we can defer the
creation of a child AppendRelInfo to a point when it (its field
translated_vars in particular) will actually be used and so lock and open
the child tables only at such a time. Although we don't lock and open the
partition child tables in expand_inherited_rtentry(), their RT entries are
still created and added to root->parse->rtable, so that
setup_simple_rel_arrays() knows the maximum number of entries
root->simple_rel_array will need to hold and allocate the memory for that
array accordingly. Slots in simple_rel_array[] corresponding to
partition child tables will be empty until they are created when
set_append_rel_size() is called on the root parent table and it determines
the partitions that will be scanned after all.
Patch augments the existing PartitionedChildRelInfo node, which currently
holds only the partitioned child rel RT indexes, to carry some more
information about the partition tree, which includes the information
returned by RelationGetPartitionDispatchInfo() when it is called from
expand_inherited_rtentry() (per the proposed patch in [1]/messages/by-id/befd7ec9-8f4c-6928-d330-ab05dbf860bf@lab.ntt.co.jp, we call it to
be able to add partitions to the query tree in the bound order).
Actually, since PartitionedChildRelInfo now contains more information
about the partition tree than it used to before, I thought the struct's
name is no longer relevant, so renamed it to PartitionRootInfo and renamed
root->pcinfo_list accordingly to prinfo_list. That seems okay because we
only use that node internally.
Then during the add_base_rels_to_query() step, when build_simple_rel()
builds a RelOptInfo for the root partitioned table, it also initializes
some newly introduced fields in RelOptInfo from the information contained
in PartitionRootInfo of the table. The aforementioned fields are only
initialized in RelOptInfos of root partitioned tables. Note that the
add_base_rels_to_query() step won't add the partition "otherrel"
RelOptInfos yet (unlike the regular inheritance case, where they are,
after looking them up in root->append_rel_list).
When set_append_rel_size() is called on the root partitioned table, it
will call a find_partitions_for_query(), which using the partition tree
information, determines the partitions that will need to be scanned for
the query. This processing happens recursively, that is, we first
determine the root-parent's partitions and then for each partition that's
partitioned, we will determine its partitions and so on. As we determine
partitions in this per-partitioned-table manner, we maintain a pair
(parent_relid, list-of-partition-relids-to-scan) for each partitioned
table and also a single list of all leaf partitions determined so far.
Once all partitions have been determined, we turn to locking the leaf
partitions. The locking happens in the order of OIDs as
find_all_inheritors would have returned in expand_inherited_rtentry(); the
list of OIDs in that original order is also stored in the table's
PartitionRootInfo node. For each OID in that list, check if that OID is
in the set of leaf partition OIDs that was just computed, and if so, lock
it. For all chosen partitions that are partitioned tables (including the
root), we create a PartitionAppendInfo node which stores the
aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
append it to a list in the root table's RelOptInfo, with the root table's
PartitionAppendInfo at the head of the list. Note that the list of
partitions in this pair contains only the immediate partitions, so that
the original parent-child relationship is reflected in the list of
PartitionAppendInfos thus collected. The next patch that will implement
actual partition-pruning will add some more code that will run under
find_partitions_for_query().
set_append_rel_size() processing then continues for the root partitioned
table. It is at this point that we will create the RelOptInfos and
AppendRelInfos for partitions. First for those of the root partitioned
table and then for those of each partitioned table when
set_append_rel_size() will be recursively called for the latter.
Note that this is still largely a WIP patch and the implementation details
might change per both the feedback here and the discussion at [1]/messages/by-id/befd7ec9-8f4c-6928-d330-ab05dbf860bf@lab.ntt.co.jp.
Thanks,
Amit
[1]: /messages/by-id/befd7ec9-8f4c-6928-d330-ab05dbf860bf@lab.ntt.co.jp
/messages/by-id/befd7ec9-8f4c-6928-d330-ab05dbf860bf@lab.ntt.co.jp
Attachments:
0001-Teach-pg_inherits.c-a-bit-about-partitioning.patchtext/plain; charset=UTF-8; name=0001-Teach-pg_inherits.c-a-bit-about-partitioning.patchDownload
From 567e07fa19af575ece50f607a4374c370ae7375f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 8 Aug 2017 18:42:30 +0900
Subject: [PATCH 1/3] Teach pg_inherits.c a bit about partitioning
Both find_inheritance_children and find_all_inheritors now list
partitioned child tables before non-partitioned ones and return
the number of partitioned tables in an optional output argument
We also now store in pg_inherits, when adding a new child, if the
child is a partitioned table.
Per design idea from Robert Haas
---
contrib/sepgsql/dml.c | 2 +-
doc/src/sgml/catalogs.sgml | 10 +++
src/backend/catalog/partition.c | 2 +-
src/backend/catalog/pg_inherits.c | 157 ++++++++++++++++++++++++++-------
src/backend/commands/analyze.c | 3 +-
src/backend/commands/lockcmds.c | 2 +-
src/backend/commands/publicationcmds.c | 2 +-
src/backend/commands/tablecmds.c | 56 +++++++-----
src/backend/commands/vacuum.c | 3 +-
src/backend/executor/execMain.c | 3 +-
src/backend/optimizer/prep/prepunion.c | 2 +-
src/include/catalog/pg_inherits.h | 20 ++++-
src/include/catalog/pg_inherits_fn.h | 5 +-
13 files changed, 200 insertions(+), 67 deletions(-)
diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c
index b643720e36..6fc279805c 100644
--- a/contrib/sepgsql/dml.c
+++ b/contrib/sepgsql/dml.c
@@ -333,7 +333,7 @@ sepgsql_dml_privileges(List *rangeTabls, bool abort_on_violation)
if (!rte->inh)
tableIds = list_make1_oid(rte->relid);
else
- tableIds = find_all_inheritors(rte->relid, NoLock, NULL);
+ tableIds = find_all_inheritors(rte->relid, NoLock, NULL, NULL);
foreach(li, tableIds)
{
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ef7054cf26..00ba2906c2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3894,6 +3894,16 @@ SCRAM-SHA-256$<replaceable><iteration count></>:<replaceable><salt><
inherited columns are to be arranged. The count starts at 1.
</entry>
</row>
+
+ <row>
+ <entry><structfield>inhchildpartitioned</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>
+ This is <literal>true</> if the child table is a partitioned table,
+ <literal>false</> otherwise
+ </entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 7618e4cb31..36f5c80b4f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -196,7 +196,7 @@ RelationBuildPartitionDesc(Relation rel)
return;
/* Get partition oids from pg_inherits */
- inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
+ inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, NULL);
/* Collect bound spec nodes in a list */
i = 0;
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 245a374fc9..5292ec8058 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -33,6 +33,8 @@
#include "utils/syscache.h"
#include "utils/tqual.h"
+static int32 inhchildinfo_cmp(const void *p1, const void *p2);
+
/*
* Entry of a hash table used in find_all_inheritors. See below.
*/
@@ -42,6 +44,30 @@ typedef struct SeenRelsEntry
ListCell *numparents_cell; /* corresponding list cell */
} SeenRelsEntry;
+/* Information about one inheritance child table. */
+typedef struct InhChildInfo
+{
+ Oid relid;
+ bool is_partitioned;
+} InhChildInfo;
+
+#define OID_CMP(o1, o2) \
+ ((o1) < (o2) ? -1 : ((o1) > (o2) ? 1 : 0));
+
+static int32
+inhchildinfo_cmp(const void *p1, const void *p2)
+{
+ InhChildInfo c1 = *((const InhChildInfo *) p1);
+ InhChildInfo c2 = *((const InhChildInfo *) p2);
+
+ if (c1.is_partitioned && !c2.is_partitioned)
+ return -1;
+ if (!c1.is_partitioned && c2.is_partitioned)
+ return 1;
+
+ return OID_CMP(c1.relid, c2.relid);
+}
+
/*
* find_inheritance_children
*
@@ -54,7 +80,8 @@ typedef struct SeenRelsEntry
* against possible DROPs of child relations.
*/
List *
-find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
+find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+ int *num_partitioned_children)
{
List *list = NIL;
Relation relation;
@@ -62,9 +89,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
ScanKeyData key[1];
HeapTuple inheritsTuple;
Oid inhrelid;
- Oid *oidarr;
- int maxoids,
- numoids,
+ InhChildInfo *inhchildren;
+ int maxchildren,
+ numchildren,
+ my_num_partitioned_children,
i;
/*
@@ -77,9 +105,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
/*
* Scan pg_inherits and build a working array of subclass OIDs.
*/
- maxoids = 32;
- oidarr = (Oid *) palloc(maxoids * sizeof(Oid));
- numoids = 0;
+ maxchildren = 32;
+ inhchildren = (InhChildInfo *) palloc(maxchildren * sizeof(InhChildInfo));
+ numchildren = 0;
+ my_num_partitioned_children = 0;
relation = heap_open(InheritsRelationId, AccessShareLock);
@@ -93,34 +122,45 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
while ((inheritsTuple = systable_getnext(scan)) != NULL)
{
- inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
- if (numoids >= maxoids)
+ Form_pg_inherits form = (Form_pg_inherits) GETSTRUCT(inheritsTuple);
+
+ if (numchildren >= maxchildren)
{
- maxoids *= 2;
- oidarr = (Oid *) repalloc(oidarr, maxoids * sizeof(Oid));
+ maxchildren *= 2;
+ inhchildren = (InhChildInfo *) repalloc(inhchildren,
+ maxchildren * sizeof(InhChildInfo));
}
- oidarr[numoids++] = inhrelid;
+ inhchildren[numchildren].relid = form->inhrelid;
+ inhchildren[numchildren].is_partitioned = form->inhpartitioned;
+
+ if (form->inhpartitioned)
+ my_num_partitioned_children++;
+ numchildren++;
}
systable_endscan(scan);
heap_close(relation, AccessShareLock);
+ if (num_partitioned_children)
+ *num_partitioned_children = my_num_partitioned_children;
+
/*
* If we found more than one child, sort them by OID. This ensures
* reasonably consistent behavior regardless of the vagaries of an
* indexscan. This is important since we need to be sure all backends
* lock children in the same order to avoid needless deadlocks.
*/
- if (numoids > 1)
- qsort(oidarr, numoids, sizeof(Oid), oid_cmp);
+ if (numchildren > 1)
+ qsort(inhchildren, numchildren, sizeof(InhChildInfo),
+ inhchildinfo_cmp);
/*
* Acquire locks and build the result list.
*/
- for (i = 0; i < numoids; i++)
+ for (i = 0; i < numchildren; i++)
{
- inhrelid = oidarr[i];
+ inhrelid = inhchildren[i].relid;
if (lockmode != NoLock)
{
@@ -144,7 +184,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
list = lappend_oid(list, inhrelid);
}
- pfree(oidarr);
+ pfree(inhchildren);
return list;
}
@@ -159,19 +199,30 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
* given rel.
*
* The specified lock type is acquired on all child relations (but not on the
- * given rel; caller should already have locked it). If lockmode is NoLock
- * then no locks are acquired, but caller must beware of race conditions
- * against possible DROPs of child relations.
+ * given rel; caller should already have locked it), unless
+ * lock_only_partitioned_children is specified, in which case, only the
+ * child relations that are partitioned tables are locked. If lockmode is
+ * NoLock then no locks are acquired, but caller must beware of race
+ * conditions against possible DROPs of child relations.
+ *
+ * Returned list of OIDs is such that all the partitioned tables in the tree
+ * appear at the head of the list. If num_partitioned_children is non-NULL,
+ * *num_partitioned_children returns the number of partitioned child table
+ * OIDs at the head of the list.
*/
List *
-find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
+find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
+ List **numparents, int *num_partitioned_children)
{
/* hash table for O(1) rel_oid -> rel_numparents cell lookup */
HTAB *seen_rels;
HASHCTL ctl;
List *rels_list,
- *rel_numparents;
+ *rel_numparents,
+ *partitioned_rels_list,
+ *other_rels_list;
ListCell *l;
+ int my_num_partitioned_children;
memset(&ctl, 0, sizeof(ctl));
ctl.keysize = sizeof(Oid);
@@ -185,31 +236,69 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
/*
* We build a list starting with the given rel and adding all direct and
- * indirect children. We can use a single list as both the record of
- * already-found rels and the agenda of rels yet to be scanned for more
- * children. This is a bit tricky but works because the foreach() macro
- * doesn't fetch the next list element until the bottom of the loop.
+ * indirect children. We can use a single list (rels_list) as both the
+ * record of already-found rels and the agenda of rels yet to be scanned
+ * for more children. This is a bit tricky but works because the foreach()
+ * macro doesn't fetch the next list element until the bottom of the loop.
+ *
+ * partitioned_child_rels will contain the OIDs of the partitioned child
+ * tables and other_rels_list will contain the OIDs of the non-partitioned
+ * child tables. Result list will be generated by concatening the two
+ * lists together with partitioned_child_rels appearing first.
*/
rels_list = list_make1_oid(parentrelId);
+ partitioned_rels_list = list_make1_oid(parentrelId);
+ other_rels_list = NIL;
rel_numparents = list_make1_int(0);
+ my_num_partitioned_children = 0;
+
foreach(l, rels_list)
{
Oid currentrel = lfirst_oid(l);
List *currentchildren;
- ListCell *lc;
+ ListCell *lc,
+ *first_nonpartitioned_child;
+ int cur_num_partitioned_children = 0,
+ i;
/* Get the direct children of this rel */
- currentchildren = find_inheritance_children(currentrel, lockmode);
+ currentchildren = find_inheritance_children(currentrel, lockmode,
+ &cur_num_partitioned_children);
+
+ my_num_partitioned_children += cur_num_partitioned_children;
+
+ /*
+ * Append partitioned children to rels_list and partitioned_rels_list.
+ * We know for sure that partitioned children don't need the
+ * the de-duplication logic in the following loop, because partitioned
+ * tables are not allowed to partiticipate in multiple inheritance.
+ */
+ i = 0;
+ foreach(lc, currentchildren)
+ {
+ if (i < cur_num_partitioned_children)
+ {
+ Oid child_oid = lfirst_oid(lc);
+
+ rels_list = lappend_oid(rels_list, child_oid);
+ partitioned_rels_list = lappend_oid(partitioned_rels_list,
+ child_oid);
+ }
+ else
+ break;
+ i++;
+ }
+ first_nonpartitioned_child = lc;
/*
* Add to the queue only those children not already seen. This avoids
* making duplicate entries in case of multiple inheritance paths from
* the same parent. (It'll also keep us from getting into an infinite
* loop, though theoretically there can't be any cycles in the
- * inheritance graph anyway.)
+ * inheritance graph anyway.) Also, add them to the other_rels_list.
*/
- foreach(lc, currentchildren)
+ for_each_cell(lc, first_nonpartitioned_child)
{
Oid child_oid = lfirst_oid(lc);
bool found;
@@ -225,6 +314,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
{
/* if it's not there, add it. expect 1 parent, initially. */
rels_list = lappend_oid(rels_list, child_oid);
+ other_rels_list = lappend_oid(other_rels_list, child_oid);
rel_numparents = lappend_int(rel_numparents, 1);
hash_entry->numparents_cell = rel_numparents->tail;
}
@@ -237,8 +327,13 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
list_free(rel_numparents);
hash_destroy(seen_rels);
+ list_free(rels_list);
+
+ if (num_partitioned_children)
+ *num_partitioned_children = my_num_partitioned_children;
- return rels_list;
+ /* List partitioned child tables before non-partitioned ones. */
+ return list_concat(partitioned_rels_list, other_rels_list);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fbad13ea94..10cc2b8314 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1282,7 +1282,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
* the children.
*/
tableOIDs =
- find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL);
+ find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL,
+ NULL);
/*
* Check that there's at least one descendant, else fail. This could
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 9fe9e022b0..529f244f7e 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -112,7 +112,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
List *children;
ListCell *lc;
- children = find_inheritance_children(reloid, NoLock);
+ children = find_inheritance_children(reloid, NoLock, NULL);
foreach(lc, children)
{
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 610cb499d2..64179ea3ef 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -516,7 +516,7 @@ OpenTableList(List *tables)
List *children;
children = find_all_inheritors(myrelid, ShareUpdateExclusiveLock,
- NULL);
+ NULL, NULL);
foreach(child, children)
{
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0f08245a67..4d686a6f71 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -299,10 +299,10 @@ static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel);
static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel);
static void StoreCatalogInheritance(Oid relationId, List *supers,
- bool child_is_partition);
+ bool child_is_partition, bool child_is_partitioned);
static void StoreCatalogInheritance1(Oid relationId, Oid parentOid,
int16 seqNumber, Relation inhRelation,
- bool child_is_partition);
+ bool child_is_partition, bool child_is_partitioned);
static int findAttrByName(const char *attributeName, List *schema);
static void AlterIndexNamespaces(Relation classRel, Relation rel,
Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved);
@@ -753,7 +753,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
typaddress);
/* Store inheritance information for new rel. */
- StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL);
+ StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL,
+ relkind == RELKIND_PARTITIONED_TABLE);
/*
* We must bump the command counter to make the newly-created relation
@@ -1238,7 +1239,8 @@ ExecuteTruncate(TruncateStmt *stmt)
ListCell *child;
List *children;
- children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL);
+ children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL,
+ NULL);
foreach(child, children)
{
@@ -2305,7 +2307,7 @@ MergeCheckConstraint(List *constraints, char *name, Node *expr)
*/
static void
StoreCatalogInheritance(Oid relationId, List *supers,
- bool child_is_partition)
+ bool child_is_partition, bool child_is_partitioned)
{
Relation relation;
int16 seqNumber;
@@ -2336,7 +2338,7 @@ StoreCatalogInheritance(Oid relationId, List *supers,
Oid parentOid = lfirst_oid(entry);
StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation,
- child_is_partition);
+ child_is_partition, child_is_partitioned);
seqNumber++;
}
@@ -2350,7 +2352,7 @@ StoreCatalogInheritance(Oid relationId, List *supers,
static void
StoreCatalogInheritance1(Oid relationId, Oid parentOid,
int16 seqNumber, Relation inhRelation,
- bool child_is_partition)
+ bool child_is_partition, bool child_is_partitioned)
{
TupleDesc desc = RelationGetDescr(inhRelation);
Datum values[Natts_pg_inherits];
@@ -2365,6 +2367,8 @@ StoreCatalogInheritance1(Oid relationId, Oid parentOid,
values[Anum_pg_inherits_inhrelid - 1] = ObjectIdGetDatum(relationId);
values[Anum_pg_inherits_inhparent - 1] = ObjectIdGetDatum(parentOid);
values[Anum_pg_inherits_inhseqno - 1] = Int16GetDatum(seqNumber);
+ values[Anum_pg_inherits_inhpartitioned - 1] =
+ BoolGetDatum(child_is_partitioned);
memset(nulls, 0, sizeof(nulls));
@@ -2564,7 +2568,7 @@ renameatt_internal(Oid myrelid,
* outside the inheritance hierarchy being processed.
*/
child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
- &child_numparents);
+ &child_numparents, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -2591,7 +2595,7 @@ renameatt_internal(Oid myrelid,
* expected_parents will only be 0 if we are not already recursing.
*/
if (expected_parents == 0 &&
- find_inheritance_children(myrelid, NoLock) != NIL)
+ find_inheritance_children(myrelid, NoLock, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited column \"%s\" must be renamed in child tables too",
@@ -2774,7 +2778,7 @@ rename_constraint_internal(Oid myrelid,
*li;
child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
- &child_numparents);
+ &child_numparents, NULL);
forboth(lo, child_oids, li, child_numparents)
{
@@ -2790,7 +2794,7 @@ rename_constraint_internal(Oid myrelid,
else
{
if (expected_parents == 0 &&
- find_inheritance_children(myrelid, NoLock) != NIL)
+ find_inheritance_children(myrelid, NoLock, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited constraint \"%s\" must be renamed in child tables too",
@@ -4803,7 +4807,7 @@ ATSimpleRecursion(List **wqueue, Relation rel,
ListCell *child;
List *children;
- children = find_all_inheritors(relid, lockmode, NULL);
+ children = find_all_inheritors(relid, lockmode, NULL, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -5212,7 +5216,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
*/
if (colDef->identity &&
recurse &&
- find_inheritance_children(myrelid, NoLock) != NIL)
+ find_inheritance_children(myrelid, NoLock, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("cannot recursively add identity column to table that has child tables")));
@@ -5418,7 +5422,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
* routines, we have to do this one level of recursion at a time; we can't
* use find_all_inheritors to do it in one pass.
*/
- children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+ children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+ NULL);
/*
* If we are told not to recurse, there had better not be any child
@@ -6537,7 +6542,8 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,
* routines, we have to do this one level of recursion at a time; we can't
* use find_all_inheritors to do it in one pass.
*/
- children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+ children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+ NULL);
if (children)
{
@@ -6971,7 +6977,8 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
* routines, we have to do this one level of recursion at a time; we can't
* use find_all_inheritors to do it in one pass.
*/
- children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+ children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+ NULL);
/*
* Check if ONLY was specified with ALTER TABLE. If so, allow the
@@ -7692,7 +7699,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
*/
if (!recursing && !con->connoinherit)
children = find_all_inheritors(RelationGetRelid(rel),
- lockmode, NULL);
+ lockmode, NULL, NULL);
/*
* For CHECK constraints, we must ensure that we only mark the
@@ -8575,7 +8582,8 @@ ATExecDropConstraint(Relation rel, const char *constrName,
* use find_all_inheritors to do it in one pass.
*/
if (!is_no_inherit_constraint)
- children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+ children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+ NULL);
else
children = NIL;
@@ -8864,7 +8872,7 @@ ATPrepAlterColumnType(List **wqueue,
ListCell *child;
List *children;
- children = find_all_inheritors(relid, lockmode, NULL);
+ children = find_all_inheritors(relid, lockmode, NULL, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -8915,7 +8923,8 @@ ATPrepAlterColumnType(List **wqueue,
}
}
else if (!recursing &&
- find_inheritance_children(RelationGetRelid(rel), NoLock) != NIL)
+ find_inheritance_children(RelationGetRelid(rel),
+ NoLock, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("type of inherited column \"%s\" must be changed in child tables too",
@@ -11027,7 +11036,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
* We use weakest lock we can on child's children, namely AccessShareLock.
*/
children = find_all_inheritors(RelationGetRelid(child_rel),
- AccessShareLock, NULL);
+ AccessShareLock, NULL, NULL);
if (list_member_oid(children, RelationGetRelid(parent_rel)))
ereport(ERROR,
@@ -11136,6 +11145,8 @@ CreateInheritance(Relation child_rel, Relation parent_rel)
inhseqno + 1,
catalogRelation,
parent_rel->rd_rel->relkind ==
+ RELKIND_PARTITIONED_TABLE,
+ child_rel->rd_rel->relkind ==
RELKIND_PARTITIONED_TABLE);
/* Now we're done with pg_inherits */
@@ -13696,7 +13707,8 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
* weaker lock now and the stronger one only when needed.
*/
attachrel_children = find_all_inheritors(RelationGetRelid(attachrel),
- AccessExclusiveLock, NULL);
+ AccessExclusiveLock, NULL,
+ NULL);
if (list_member_oid(attachrel_children, RelationGetRelid(rel)))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_TABLE),
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index faa181207a..e2e5ffce42 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -430,7 +430,8 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
oldcontext = MemoryContextSwitchTo(vac_context);
if (include_parts)
oid_list = list_concat(oid_list,
- find_all_inheritors(relid, NoLock, NULL));
+ find_all_inheritors(relid, NoLock, NULL,
+ NULL));
else
oid_list = lappend_oid(oid_list, relid);
MemoryContextSwitchTo(oldcontext);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index a03188aba3..4424649769 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3278,7 +3278,8 @@ ExecSetupPartitionTupleRouting(Relation rel,
* Get the information about the partition tree after locking all the
* partitions.
*/
- (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL);
+ (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL,
+ NULL);
RelationGetPartitionDispatchInfo(rel, &ptinfos, &leaf_parts);
/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 68d0d8efa3..b84d6c8878 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1425,7 +1425,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
lockmode = AccessShareLock;
/* Scan for all members of inheritance set, acquire needed locks */
- inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+ inhOIDs = find_all_inheritors(parentOID, lockmode, NULL, NULL);
/*
* Check that there's at least one descendant, else treat as no-child
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index 26bfab5db6..9f59c017e7 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -30,9 +30,20 @@
CATALOG(pg_inherits,2611) BKI_WITHOUT_OIDS
{
+ /* OID of the child table. */
Oid inhrelid;
+
+ /* OID of the parent table. */
Oid inhparent;
+
+ /*
+ * Sequence number (starting with 1) of this parent, if this child table
+ * has multiple parents.
+ */
int32 inhseqno;
+
+ /* true if the child is a partitioned table, false otherwise. */
+ bool inhpartitioned;
} FormData_pg_inherits;
/* ----------------
@@ -46,10 +57,11 @@ typedef FormData_pg_inherits *Form_pg_inherits;
* compiler constants for pg_inherits
* ----------------
*/
-#define Natts_pg_inherits 3
-#define Anum_pg_inherits_inhrelid 1
-#define Anum_pg_inherits_inhparent 2
-#define Anum_pg_inherits_inhseqno 3
+#define Natts_pg_inherits 4
+#define Anum_pg_inherits_inhrelid 1
+#define Anum_pg_inherits_inhparent 2
+#define Anum_pg_inherits_inhseqno 3
+#define Anum_pg_inherits_inhpartitioned 4
/* ----------------
* pg_inherits has no initial contents
diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h
index 7743388899..8f371acae7 100644
--- a/src/include/catalog/pg_inherits_fn.h
+++ b/src/include/catalog/pg_inherits_fn.h
@@ -17,9 +17,10 @@
#include "nodes/pg_list.h"
#include "storage/lock.h"
-extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
+extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+ int *num_partitioned_children);
extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
- List **parents);
+ List **parents, int *num_partitioned_children);
extern bool has_subclass(Oid relationId);
extern bool has_superclass(Oid relationId);
extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId);
--
2.11.0
0002-Allow-locking-only-partitioned-children-in-partition.patchtext/plain; charset=UTF-8; name=0002-Allow-locking-only-partitioned-children-in-partition.patchDownload
From ef86d03a6ed6ac0cdbdede0c1012f9006ed24de2 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 10 Aug 2017 17:59:18 +0900
Subject: [PATCH 2/3] Allow locking only partitioned children in partition tree
find_inheritance_childrem will still return the OIDs of the
non-partitioned children, but does not lock them if the caller asks
it so.
None of the callers pass 'true' yet though.
---
contrib/sepgsql/dml.c | 3 ++-
src/backend/catalog/partition.c | 3 ++-
src/backend/catalog/pg_inherits.c | 20 ++++++++++++++++----
src/backend/commands/analyze.c | 4 ++--
src/backend/commands/lockcmds.c | 2 +-
src/backend/commands/publicationcmds.c | 2 +-
src/backend/commands/tablecmds.c | 34 +++++++++++++++++-----------------
src/backend/commands/vacuum.c | 4 ++--
src/backend/executor/execMain.c | 4 ++--
src/backend/optimizer/prep/prepunion.c | 2 +-
src/include/catalog/pg_inherits_fn.h | 2 ++
11 files changed, 48 insertions(+), 32 deletions(-)
diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c
index 6fc279805c..91f338f8bf 100644
--- a/contrib/sepgsql/dml.c
+++ b/contrib/sepgsql/dml.c
@@ -333,7 +333,8 @@ sepgsql_dml_privileges(List *rangeTabls, bool abort_on_violation)
if (!rte->inh)
tableIds = list_make1_oid(rte->relid);
else
- tableIds = find_all_inheritors(rte->relid, NoLock, NULL, NULL);
+ tableIds = find_all_inheritors(rte->relid, NoLock, false,
+ NULL, NULL);
foreach(li, tableIds)
{
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 36f5c80b4f..c972760fe4 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -196,7 +196,8 @@ RelationBuildPartitionDesc(Relation rel)
return;
/* Get partition oids from pg_inherits */
- inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, NULL);
+ inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, false,
+ NULL);
/* Collect bound spec nodes in a list */
i = 0;
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 5292ec8058..72420f65f1 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -74,13 +74,16 @@ inhchildinfo_cmp(const void *p1, const void *p2)
* Returns a list containing the OIDs of all relations which
* inherit *directly* from the relation with OID 'parentrelId'.
*
- * The specified lock type is acquired on each child relation (but not on the
- * given rel; caller should already have locked it). If lockmode is NoLock
- * then no locks are acquired, but caller must beware of race conditions
- * against possible DROPs of child relations.
+ * The specified lock type is acquired on each child relation, (but not on the
+ * given rel; caller should already have locked it), unless
+ * lock_only_partitioned_children is specified in which case only partitioned
+ * children are locked. If lockmode is NoLock then no locks are acquired, but
+ * caller must beware of race conditions against possible DROPs of child
+ * relations.
*/
List *
find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+ bool lock_only_partitioned_children,
int *num_partitioned_children)
{
List *list = NIL;
@@ -162,6 +165,13 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
{
inhrelid = inhchildren[i].relid;
+ /* If requested, skip locking non-partitioned children. */
+ if (lock_only_partitioned_children && i >= *num_partitioned_children)
+ {
+ list = lappend_oid(list, inhrelid);
+ continue;
+ }
+
if (lockmode != NoLock)
{
/* Get the lock to synchronize against concurrent drop */
@@ -212,6 +222,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
*/
List *
find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
+ bool lock_only_partitioned_children,
List **numparents, int *num_partitioned_children)
{
/* hash table for O(1) rel_oid -> rel_numparents cell lookup */
@@ -264,6 +275,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
/* Get the direct children of this rel */
currentchildren = find_inheritance_children(currentrel, lockmode,
+ lock_only_partitioned_children,
&cur_num_partitioned_children);
my_num_partitioned_children += cur_num_partitioned_children;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 10cc2b8314..4bd374632f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1282,8 +1282,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
* the children.
*/
tableOIDs =
- find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL,
- NULL);
+ find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, false,
+ NULL, NULL);
/*
* Check that there's at least one descendant, else fail. This could
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 529f244f7e..771aa11b1c 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -112,7 +112,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
List *children;
ListCell *lc;
- children = find_inheritance_children(reloid, NoLock, NULL);
+ children = find_inheritance_children(reloid, NoLock, false, NULL);
foreach(lc, children)
{
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 64179ea3ef..4315028c66 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -516,7 +516,7 @@ OpenTableList(List *tables)
List *children;
children = find_all_inheritors(myrelid, ShareUpdateExclusiveLock,
- NULL, NULL);
+ false, NULL, NULL);
foreach(child, children)
{
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 4d686a6f71..ef3869854a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1239,8 +1239,8 @@ ExecuteTruncate(TruncateStmt *stmt)
ListCell *child;
List *children;
- children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL,
- NULL);
+ children = find_all_inheritors(myrelid, AccessExclusiveLock, false,
+ NULL, NULL);
foreach(child, children)
{
@@ -2567,7 +2567,7 @@ renameatt_internal(Oid myrelid,
* calls to renameatt() can determine whether there are any parents
* outside the inheritance hierarchy being processed.
*/
- child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
+ child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, false,
&child_numparents, NULL);
/*
@@ -2595,7 +2595,7 @@ renameatt_internal(Oid myrelid,
* expected_parents will only be 0 if we are not already recursing.
*/
if (expected_parents == 0 &&
- find_inheritance_children(myrelid, NoLock, NULL) != NIL)
+ find_inheritance_children(myrelid, NoLock, false, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited column \"%s\" must be renamed in child tables too",
@@ -2778,7 +2778,7 @@ rename_constraint_internal(Oid myrelid,
*li;
child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
- &child_numparents, NULL);
+ false, &child_numparents, NULL);
forboth(lo, child_oids, li, child_numparents)
{
@@ -2794,7 +2794,7 @@ rename_constraint_internal(Oid myrelid,
else
{
if (expected_parents == 0 &&
- find_inheritance_children(myrelid, NoLock, NULL) != NIL)
+ find_inheritance_children(myrelid, NoLock, false, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited constraint \"%s\" must be renamed in child tables too",
@@ -4807,7 +4807,7 @@ ATSimpleRecursion(List **wqueue, Relation rel,
ListCell *child;
List *children;
- children = find_all_inheritors(relid, lockmode, NULL, NULL);
+ children = find_all_inheritors(relid, lockmode, false, NULL, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -5216,7 +5216,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
*/
if (colDef->identity &&
recurse &&
- find_inheritance_children(myrelid, NoLock, NULL) != NIL)
+ find_inheritance_children(myrelid, NoLock, false, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("cannot recursively add identity column to table that has child tables")));
@@ -5423,7 +5423,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
* use find_all_inheritors to do it in one pass.
*/
children = find_inheritance_children(RelationGetRelid(rel), lockmode,
- NULL);
+ false, NULL);
/*
* If we are told not to recurse, there had better not be any child
@@ -6543,7 +6543,7 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,
* use find_all_inheritors to do it in one pass.
*/
children = find_inheritance_children(RelationGetRelid(rel), lockmode,
- NULL);
+ false, NULL);
if (children)
{
@@ -6978,7 +6978,7 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
* use find_all_inheritors to do it in one pass.
*/
children = find_inheritance_children(RelationGetRelid(rel), lockmode,
- NULL);
+ false, NULL);
/*
* Check if ONLY was specified with ALTER TABLE. If so, allow the
@@ -7699,7 +7699,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
*/
if (!recursing && !con->connoinherit)
children = find_all_inheritors(RelationGetRelid(rel),
- lockmode, NULL, NULL);
+ lockmode, false, NULL, NULL);
/*
* For CHECK constraints, we must ensure that we only mark the
@@ -8583,7 +8583,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
*/
if (!is_no_inherit_constraint)
children = find_inheritance_children(RelationGetRelid(rel), lockmode,
- NULL);
+ false, NULL);
else
children = NIL;
@@ -8872,7 +8872,7 @@ ATPrepAlterColumnType(List **wqueue,
ListCell *child;
List *children;
- children = find_all_inheritors(relid, lockmode, NULL, NULL);
+ children = find_all_inheritors(relid, lockmode, false, NULL, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -8924,7 +8924,7 @@ ATPrepAlterColumnType(List **wqueue,
}
else if (!recursing &&
find_inheritance_children(RelationGetRelid(rel),
- NoLock, NULL) != NIL)
+ NoLock, false, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("type of inherited column \"%s\" must be changed in child tables too",
@@ -11036,7 +11036,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
* We use weakest lock we can on child's children, namely AccessShareLock.
*/
children = find_all_inheritors(RelationGetRelid(child_rel),
- AccessShareLock, NULL, NULL);
+ AccessShareLock, false, NULL, NULL);
if (list_member_oid(children, RelationGetRelid(parent_rel)))
ereport(ERROR,
@@ -13707,7 +13707,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
* weaker lock now and the stronger one only when needed.
*/
attachrel_children = find_all_inheritors(RelationGetRelid(attachrel),
- AccessExclusiveLock, NULL,
+ AccessExclusiveLock, false, NULL,
NULL);
if (list_member_oid(attachrel_children, RelationGetRelid(rel)))
ereport(ERROR,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e2e5ffce42..70cd5721f3 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -430,8 +430,8 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
oldcontext = MemoryContextSwitchTo(vac_context);
if (include_parts)
oid_list = list_concat(oid_list,
- find_all_inheritors(relid, NoLock, NULL,
- NULL));
+ find_all_inheritors(relid, NoLock, false,
+ NULL, NULL));
else
oid_list = lappend_oid(oid_list, relid);
MemoryContextSwitchTo(oldcontext);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4424649769..63529ab1dd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3278,8 +3278,8 @@ ExecSetupPartitionTupleRouting(Relation rel,
* Get the information about the partition tree after locking all the
* partitions.
*/
- (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL,
- NULL);
+ (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, false,
+ NULL, NULL);
RelationGetPartitionDispatchInfo(rel, &ptinfos, &leaf_parts);
/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b84d6c8878..ee2e066263 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1425,7 +1425,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
lockmode = AccessShareLock;
/* Scan for all members of inheritance set, acquire needed locks */
- inhOIDs = find_all_inheritors(parentOID, lockmode, NULL, NULL);
+ inhOIDs = find_all_inheritors(parentOID, lockmode, false, NULL, NULL);
/*
* Check that there's at least one descendant, else treat as no-child
diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h
index 8f371acae7..e568d11e43 100644
--- a/src/include/catalog/pg_inherits_fn.h
+++ b/src/include/catalog/pg_inherits_fn.h
@@ -18,8 +18,10 @@
#include "storage/lock.h"
extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+ bool lock_only_partitioned_children,
int *num_partitioned_children);
extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
+ bool lock_only_partitioned_children,
List **parents, int *num_partitioned_children);
extern bool has_subclass(Oid relationId);
extern bool has_superclass(Oid relationId);
--
2.11.0
0003-WIP-Defer-opening-and-locking-partitions-to-set_appe.patchtext/plain; charset=UTF-8; name=0003-WIP-Defer-opening-and-locking-partitions-to-set_appe.patchDownload
From 49582f6707611a572b441bf692fd925e9d658781 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 26 Jul 2017 14:42:47 +0900
Subject: [PATCH 3/3] WIP: Defer opening and locking partitions to
set_append_rel_size
---
src/backend/catalog/partition.c | 20 ++
src/backend/nodes/copyfuncs.c | 17 --
src/backend/nodes/equalfuncs.c | 12 --
src/backend/nodes/outfuncs.c | 57 +++++-
src/backend/optimizer/path/allpaths.c | 357 +++++++++++++++++++++++++++++++--
src/backend/optimizer/plan/planner.c | 106 ++++++++--
src/backend/optimizer/prep/prepunion.c | 266 +++++++++++++++---------
src/backend/optimizer/util/plancat.c | 44 ++++
src/backend/optimizer/util/relnode.c | 81 +++++++-
src/backend/utils/cache/lsyscache.c | 50 +++++
src/include/catalog/partition.h | 4 +
src/include/nodes/nodes.h | 5 +-
src/include/nodes/relation.h | 93 +++++++--
src/include/optimizer/plancat.h | 1 +
src/include/optimizer/prep.h | 3 +
src/include/utils/lsyscache.h | 2 +
src/test/regress/expected/insert.out | 4 +-
17 files changed, 938 insertions(+), 184 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index c972760fe4..41127a584e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1161,6 +1161,26 @@ RelationGetPartitionDispatchInfo(Relation rel,
Assert((offset + 1) == list_length(*ptinfos));
}
+/*
+ * get_partitions_for_keys
+ * Returns the list of indexes (from pd->indexes) of the partitions that
+ * will need to be scanned for the given scan keys.
+ *
+ * TODO: add the interface to pass the query scan keys and the logic to look
+ * up partitions using those keys.
+ */
+List *
+get_partitions_for_keys(PartitionDispatch pd)
+{
+ int i;
+ List *result = NIL;
+
+ for (i = 0; i < pd->partdesc->nparts; i++)
+ result = lappend_int(result, pd->indexes[i]);
+
+ return result;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 72041693df..8d17d7f52c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2249,20 +2249,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -4994,9 +4980,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 8d92c03633..fb248f31f3 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -905,15 +905,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3155,9 +3146,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 5ce3c7c599..1c7caca013 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2211,7 +2211,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
+ WRITE_NODE_FIELD(prinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2285,6 +2285,12 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_INT_FIELD(num_parted);
+ /* don't bother printing partition_infos */
+ WRITE_INT_FIELD(num_leaf_parts);
+ /* don't bother printing leaf_part_infos */
+ WRITE_NODE_FIELD(live_partition_painfos);
+ WRITE_UINT_FIELD(root_parent_relid);
}
static void
@@ -2510,12 +2516,42 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
+_outPartitionInfo(StringInfo str, const PartitionInfo *node)
{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
+ WRITE_NODE_TYPE("PARTITIONINFO");
+
+ WRITE_UINT_FIELD(relid);
+ /* Don't bother writing out the PartitionDispatch object */
+}
+
+static void
+_outLeafPartitionInfo(StringInfo str, const LeafPartitionInfo *node)
+{
+ WRITE_NODE_TYPE("LEAFPARTITIONINFO");
+
+ WRITE_OID_FIELD(reloid);
+ WRITE_UINT_FIELD(relid);
+}
+
+static void
+_outPartitionAppendInfo(StringInfo str, const PartitionAppendInfo *node)
+{
+ WRITE_NODE_TYPE("PARTITIONAPPENDINFO");
+
+ WRITE_UINT_FIELD(parent_relid);
+ WRITE_NODE_FIELD(live_partition_relids);
+}
+
+static void
+_outPartitionRootInfo(StringInfo str, const PartitionRootInfo *node)
+{
+ WRITE_NODE_TYPE("PARTITIONROOTINFO");
WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
+ WRITE_NODE_FIELD(partition_infos);
+ WRITE_NODE_FIELD(partitioned_relids);
+ WRITE_NODE_FIELD(leaf_part_infos);
+ WRITE_NODE_FIELD(orig_leaf_part_oids);
}
static void
@@ -4043,8 +4079,17 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
+ case T_PartitionInfo:
+ _outPartitionInfo(str, obj);
+ break;
+ case T_LeafPartitionInfo:
+ _outLeafPartitionInfo(str, obj);
+ break;
+ case T_PartitionAppendInfo:
+ _outPartitionAppendInfo(str, obj);
+ break;
+ case T_PartitionRootInfo:
+ _outPartitionRootInfo(str, obj);
break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 2d7e1d84d0..c9c0b85cd9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
@@ -43,6 +44,8 @@
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -334,7 +337,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
*/
set_dummy_rel_pathlist(rel);
}
- else if (rte->inh)
+ else if (rte->inh || rte->relkind == RELKIND_PARTITIONED_TABLE)
{
/* It's an "append relation", process accordingly */
set_append_rel_size(root, rel, rti, rte);
@@ -425,7 +428,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
{
/* We already proved the relation empty, so nothing more to do */
}
- else if (rte->inh)
+ else if (rte->inh || rte->relkind == RELKIND_PARTITIONED_TABLE)
{
/* It's an "append relation", process accordingly */
set_append_rel_pathlist(root, rel, rti, rte);
@@ -845,6 +848,166 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_partitions_recurse
+ * Find partitions of the partitioned table described in partinfo,
+ * recursing for those partitions that are themselves partitioned tables
+ *
+ * rootrel is the root of the partition tree of which this table is a part.
+ * We create a PartitionAppendInfo for this partitioned table and append it to
+ * rootrel->live_partition_painfos.
+ *
+ * List of the leaf partitions of this table will be returned.
+ */
+static List *
+get_rel_partitions_recurse(RelOptInfo *rootrel,
+ PartitionInfo *partinfo,
+ PartitionInfo **all_partinfos,
+ LeafPartitionInfo **leaf_part_infos)
+{
+ PartitionAppendInfo *painfo;
+ List *indexes;
+ List *result = NIL,
+ *my_live_partitions = NIL;
+ ListCell *l;
+
+ /*
+ * Create a PartitionAppendInfo to map this table to the child tables
+ * that will be its Append children.
+ */
+ painfo = makeNode(PartitionAppendInfo);
+ painfo->parent_relid = partinfo->relid;
+
+ /* They will all be under the root table's Append node. */
+ rootrel->live_partition_painfos = lappend(rootrel->live_partition_painfos,
+ painfo);
+
+ /*
+ * TODO: collect the keys by looking at the clauses in
+ * rootrel->baserestrictinfo considering this table's partition keys.
+ */
+
+ /* Ask partition.c which partitions it thinks match the keys. */
+ indexes = get_partitions_for_keys(partinfo->pd);
+
+ /* Collect leaf partitions in the result list and recurse for others. */
+ foreach(l, indexes)
+ {
+ int index = lfirst_int(l);
+
+ if (index >= 0)
+ {
+ LeafPartitionInfo *lpinfo = leaf_part_infos[index];
+
+ result = lappend_oid(result, lpinfo->reloid);
+ my_live_partitions = lappend_int(my_live_partitions,
+ lpinfo->relid);
+ }
+ else
+ {
+ PartitionInfo *recurse_partinfo = all_partinfos[-index];
+ List *my_leaf_partitions;
+
+ my_live_partitions = lappend_int(my_live_partitions,
+ recurse_partinfo->relid);
+ my_leaf_partitions = get_rel_partitions_recurse(rootrel,
+ recurse_partinfo,
+ all_partinfos,
+ leaf_part_infos);
+ result = list_concat(result, my_leaf_partitions);
+ }
+ }
+
+ painfo->live_partition_relids = my_live_partitions;
+
+ return result;
+}
+
+/*
+ * get_rel_partitions
+ * Recursively find partitions of rel
+ */
+static List *
+get_rel_partitions(RelOptInfo *rel)
+{
+ return get_rel_partitions_recurse(rel,
+ rel->partition_infos[0],
+ rel->partition_infos,
+ rel->leaf_part_infos);
+}
+
+/*
+ * find_rel_partitions
+ * Find and lock partitions of rel relevant to this query
+ *
+ * Note that we only ever need to lock the leaf partitions, because the
+ * partitioned tables in the partition tree have already been locked.
+ */
+static void
+find_partitions_for_query(PlannerInfo *root, RelOptInfo *rel)
+{
+ List *leaf_part_oids = NIL;
+ ListCell *l;
+ PlanRowMark *rc = NULL;
+ int lockmode;
+ int num_leaf_parts,
+ i;
+ Oid *leaf_part_oids_array;
+ PartitionRootInfo *prinfo = NULL;
+
+ /* Find partitions. */
+ Assert(rel->partition_infos != NULL);
+ leaf_part_oids = get_rel_partitions(rel);
+
+ /* Convert the list to an array and sort for binary searching later. */
+ num_leaf_parts = list_length(leaf_part_oids);
+ leaf_part_oids_array = (Oid *) palloc(num_leaf_parts * sizeof(Oid));
+ i = 0;
+ foreach(l, leaf_part_oids)
+ {
+ leaf_part_oids_array[i++] = lfirst_oid(l);
+ }
+ qsort(leaf_part_oids_array, num_leaf_parts, sizeof(Oid), oid_cmp);
+
+ /*
+ * Now lock partitions. Note that rel cannot be a result relation or we
+ * wouldn't be here (inheritance_planner is where result relations go).
+ */
+ rc = get_plan_rowmark(root->rowMarks, rel->relid);
+ if (rc && RowMarkRequiresRowShareLock(rc->markType))
+ lockmode = RowShareLock;
+ else
+ lockmode = AccessShareLock;
+
+ /*
+ * We lock leaf partitions in the order in which find_all_inheritors
+ * found them in expand_inherited_rtentry(). Find that list by locating
+ * the PartitionRootInfo for this table.
+ */
+ foreach(l, root->prinfo_list)
+ {
+ prinfo = lfirst(l);
+
+ if (rel->relid == prinfo->parent_relid)
+ break;
+ }
+ Assert(prinfo != NULL && rel->relid == prinfo->parent_relid);
+ foreach(l, prinfo->orig_leaf_part_oids)
+ {
+ Oid relid = lfirst_oid(l);
+ Oid *test;
+
+ /* Will this leaf partition be scanned? */
+ test = (Oid *) bsearch(&relid,
+ leaf_part_oids_array,
+ num_leaf_parts,
+ sizeof(Oid), oid_cmp);
+ /* Yep, so lock. */
+ if (test != NULL)
+ LockRelationOid(relid, lockmode);
+ }
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -866,6 +1029,134 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ List *rel_appinfos = NIL;
+
+ /*
+ * Collect a list child AppendRelInfo's, which in the non-partitioned
+ * case will be found in root->append_rel_list. In the partitioned
+ * table's case, we didn't build any AppendRelInfo's yet. We will
+ * do the same after figuring out which of the table's child tables
+ * (aka partitions) will need to be scanned for this query.
+ */
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach(l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ List *live_partitions;
+ Relation parent;
+ List *parent_vars;
+ RelOptInfo *rootrel;
+
+ /*
+ * If this is a partitioned table root, we will determine all the
+ * partitions in this partition tree that we need to scan for this
+ * query. Among those, partitions that have not yet been locked (viz.
+ * the leaf partitions), will be.
+ */
+ if (rel->partition_infos != NULL)
+ {
+ PartitionAppendInfo *painfo;
+
+ rootrel = rel;
+ find_partitions_for_query(root, rel);
+ painfo = linitial(rel->live_partition_painfos);
+ Assert(rti == painfo->parent_relid);
+ live_partitions = painfo->live_partition_relids;
+ }
+ else
+ {
+ /*
+ * Just need to get hold of the PartitionAppendInfo via the root
+ * parent's RelOptInfo.
+ */
+ rootrel = root->simple_rel_array[rel->root_parent_relid];
+ foreach(l, rootrel->live_partition_painfos)
+ {
+ PartitionAppendInfo *painfo = lfirst(l);
+
+ if (rti == painfo->parent_relid)
+ {
+ live_partitions = painfo->live_partition_relids;
+ break;
+ }
+ }
+ }
+
+ /*
+ * Create an AppendRelInfo and a RelOptInfo for every candidate
+ * partition.
+ */
+ parent = heap_open(rte->relid, NoLock);
+ parent_vars = build_rel_vars(rte, rti);
+ foreach(l, live_partitions)
+ {
+ Index childRTindex = lfirst_int(l);
+ RangeTblEntry *childrte = planner_rt_fetch(childRTindex, root);
+ Relation child;
+ AppendRelInfo *appinfo;
+ RelOptInfo *childrel;
+
+ child = heap_open(childrte->relid, NoLock); /* already locked! */
+ appinfo = makeNode(AppendRelInfo);
+ appinfo->parent_relid = rti;
+ appinfo->child_relid = childRTindex;
+ appinfo->parent_reltype = parent->rd_rel->reltype;
+ appinfo->child_reltype = child->rd_rel->reltype;
+ appinfo->translated_vars = map_partition_varattnos(parent_vars,
+ rti,
+ child, parent,
+ NULL);
+ ChangeVarNodes((Node *) appinfo->translated_vars,
+ rti, childRTindex, 0);
+ appinfo->parent_reloid = rte->relid;
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ root->append_rel_list = lappend(root->append_rel_list, appinfo);
+
+ /*
+ * Translate the column permissions bitmaps to the child's attnums
+ * (we have to build the translated_vars list before we can do
+ * this). But if this is the parent table, leave copyObject's
+ * result alone.
+ *
+ * Note: we need to do this even though the executor won't run any
+ * permissions checks on the child RTE. The
+ * insertedCols/updatedCols bitmaps may be examined for
+ * trigger-firing purposes.
+ */
+ childrte->selectedCols = translate_col_privs(rte->selectedCols,
+ appinfo->translated_vars);
+ childrte->insertedCols = translate_col_privs(rte->insertedCols,
+ appinfo->translated_vars);
+ childrte->updatedCols = translate_col_privs(rte->updatedCols,
+ appinfo->translated_vars);
+
+ childrel = build_simple_rel(root, childRTindex, rel);
+ childrel->root_parent_relid = rootrel->relid;
+ Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+
+ /* Copy the data that create_lateral_join_info() created */
+ Assert(childrel->direct_lateral_relids == NULL);
+ childrel->direct_lateral_relids = rel->direct_lateral_relids;
+ Assert(childrel->lateral_relids == NULL);
+ childrel->lateral_relids = rel->lateral_relids;
+ Assert(childrel->lateral_referencers == NULL);
+ childrel->lateral_referencers = rel->lateral_referencers;
+
+ root->total_table_pages += childrel->pages;
+
+ heap_close(child, NoLock);
+ }
+ heap_close(parent, NoLock);
+ }
Assert(IS_SIMPLE_REL(rel));
@@ -889,7 +1180,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -902,10 +1193,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1211,24 +1498,61 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
int parentRTindex = rti;
List *live_childrels = NIL;
ListCell *l;
+ List *append_rel_children = NIL;
+
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach(l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ append_rel_children = lappend_int(append_rel_children,
+ appinfo->child_relid);
+ }
+ }
+ else
+ {
+ /* For a partitioned table, first find its PartitionAppendInfo */
+ if (rel->live_partition_painfos != NIL)
+ {
+ PartitionAppendInfo *painfo;
+
+ /* This is the root partitioned rel. */
+ painfo = linitial(rel->live_partition_painfos);
+ append_rel_children = painfo->live_partition_relids;
+ }
+ else
+ {
+ RelOptInfo *rootrel;
+
+ /* Non-root partitioned table. Get it from the root rel. */
+ rootrel = root->simple_rel_array[rel->root_parent_relid];
+ foreach(l, rootrel->live_partition_painfos)
+ {
+ PartitionAppendInfo *painfo = lfirst(l);
+
+ if (rti == painfo->parent_relid)
+ {
+ append_rel_children = painfo->live_partition_relids;
+ break;
+ }
+ }
+ }
+ }
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, append_rel_children)
{
- AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
- int childRTindex;
+ int childRTindex = lfirst_int(l);
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
- childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
childrel = root->simple_rel_array[childRTindex];
@@ -1289,7 +1613,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte;
rte = planner_rt_fetch(rel->relid, root);
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ /* Note that only a root partitioned table would have inh flag set. */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE && rte->inh)
{
partitioned_rels = get_partitioned_child_rels(root, rel->relid);
/* The root partitioned table is included as a child rel */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fdef00ab39..09dd32de79 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -514,7 +514,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
+ root->prinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -1050,6 +1050,93 @@ inheritance_planner(PlannerInfo *root)
Index rti;
RangeTblEntry *parent_rte;
List *partitioned_rels = NIL;
+ List *rel_appinfos = NIL;
+ ListCell *l;
+
+ parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
+ if (parent_rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach(l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ PartitionRootInfo *prinfo = NULL;
+ Relation parent;
+ List *parent_vars = build_rel_vars(parent_rte, parentRTindex);
+
+ /* Find the PartitionedChildRelInfo for this rel */
+ foreach(l, root->prinfo_list)
+ {
+ prinfo = lfirst(l);
+
+ if (prinfo->parent_relid == parentRTindex)
+ break;
+ }
+ Assert(prinfo != NULL && prinfo->parent_relid == parentRTindex);
+
+ parent = heap_open(parent_rte->relid, NoLock);
+ foreach(l, prinfo->leaf_part_infos)
+ {
+ LeafPartitionInfo *lpinfo = lfirst(l);
+ Index childRTindex = lpinfo->relid;
+ RangeTblEntry *childrte = planner_rt_fetch(childRTindex, root);
+ Relation child;
+ AppendRelInfo *appinfo;
+
+ if (childrte->relkind == RELKIND_PARTITIONED_TABLE)
+ continue;
+
+ /*
+ * We'll need RowExclusiveLock, because just like the parent, each
+ * child is a result relation.
+ */
+ child = heap_open(childrte->relid, RowExclusiveLock);
+ appinfo = makeNode(AppendRelInfo);
+ appinfo->parent_relid = parentRTindex;
+ appinfo->child_relid = childRTindex;
+ appinfo->parent_reltype = parent->rd_rel->reltype;
+ appinfo->child_reltype = child->rd_rel->reltype;
+ appinfo->translated_vars = map_partition_varattnos(parent_vars,
+ parentRTindex,
+ child, parent,
+ NULL);
+ ChangeVarNodes((Node *) appinfo->translated_vars,
+ parentRTindex, childRTindex, 0);
+ appinfo->parent_reloid = RelationGetRelid(parent);
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ root->append_rel_list = lappend(root->append_rel_list, appinfo);
+
+ /*
+ * Translate the column permissions bitmaps to the child's attnums
+ * (we have to build the translated_vars list before we can do
+ * this). But if this is the parent table, leave copyObject's
+ * result alone.
+ *
+ * Note: we need to do this even though the executor won't run any
+ * permissions checks on the child RTE. The
+ * insertedCols/updatedCols bitmaps may be examined for
+ * trigger-firing purposes.
+ */
+ childrte->selectedCols =
+ translate_col_privs(parent_rte->selectedCols,
+ appinfo->translated_vars);
+ childrte->insertedCols =
+ translate_col_privs(parent_rte->insertedCols,
+ appinfo->translated_vars);
+ childrte->updatedCols =
+ translate_col_privs(parent_rte->updatedCols,
+ appinfo->translated_vars);
+ heap_close(child, NoLock);
+ }
+ heap_close(parent, NoLock);
+ }
Assert(parse->commandType != CMD_INSERT);
@@ -1115,14 +1202,13 @@ inheritance_planner(PlannerInfo *root)
* opposite in the case of non-partitioned inheritance parent as described
* below.
*/
- parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
nominalRelation = parentRTindex;
/*
* And now we can get on with generating a plan for each child table.
*/
- foreach(lc, root->append_rel_list)
+ foreach(lc, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
PlannerInfo *subroot;
@@ -1130,10 +1216,6 @@ inheritance_planner(PlannerInfo *root)
RelOptInfo *sub_final_rel;
Path *subpath;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/*
* We need a working copy of the PlannerInfo so that we can control
* propagation of information back to the main copy.
@@ -6070,7 +6152,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
* Returns a list of the RT indexes of the partitioned child relations
* with rti as the root parent RT index.
*
- * Note: Only call this function on RTEs known to be partitioned tables.
+ * Note: Only call this function on RTEs known to be a root partitioned table.
*/
List *
get_partitioned_child_rels(PlannerInfo *root, Index rti)
@@ -6078,13 +6160,13 @@ get_partitioned_child_rels(PlannerInfo *root, Index rti)
List *result = NIL;
ListCell *l;
- foreach(l, root->pcinfo_list)
+ foreach(l, root->prinfo_list)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ PartitionRootInfo *prinfo = lfirst(l);
- if (pc->parent_relid == rti)
+ if (prinfo->parent_relid == rti)
{
- result = pc->child_rels;
+ result = prinfo->partitioned_relids;
break;
}
}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index ee2e066263..4b4d95eb63 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,6 @@ static void make_inh_translation_list(Relation oldrelation,
Relation newrelation,
Index newvarno,
List **translated_vars);
-static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
- List *translated_vars);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
static Relids adjust_child_relids(Relids relids, int nappinfos,
@@ -1352,11 +1350,19 @@ expand_inherited_tables(PlannerInfo *root)
/*
* expand_inherited_rtentry
- * Check whether a rangetable entry represents an inheritance set.
- * If so, add entries for all the child tables to the query's
- * rangetable, and build AppendRelInfo nodes for all the child tables
- * and add them to root->append_rel_list. If not, clear the entry's
- * "inh" flag to prevent later code from looking for AppendRelInfos.
+ * Perform actions necessary for applying this query to an inheritance
+ * set if the rte represents one
+ *
+ * That includes adding entries for all the child tables to the query's
+ * rangetable. Also, if this query requires a PlanRowMark, generate the same
+ * for each child table and append them to the planner's global list
+ * (root->rowMarks). If the inheritance set is really a partitioned table,
+ * our work here is done. If not, we also create AppendRelInfo nodes for
+ * all the child tables and add them to root->append_rel_list.
+ *
+ * If it turns out that the rte is not (or no longer) an inheritance set,
+ * clear the entry's "inh" flag to prevent later code from looking for
+ * AppendRelInfos.
*
* Note that the original RTE is considered to represent the whole
* inheritance set. The first of the generated RTEs is an RTE for the same
@@ -1381,9 +1387,13 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
List *inhOIDs;
List *appinfos;
ListCell *l;
- bool has_child;
- PartitionedChildRelInfo *pcinfo;
List *partitioned_child_rels = NIL;
+ List *partition_infos = NIL;
+ List *leaf_part_infos = NIL;
+ List *orig_leaf_part_oids;
+ int num_partitioned_children;
+ PartitionedTableInfo *ptinfo;
+ PartitionInfo *pinfo;
/* Does RT entry allow inheritance? */
if (!rte->inh)
@@ -1408,6 +1418,11 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* relation named in the query. However, for each child relation we add
* to the query, we must obtain an appropriate lock, because this will be
* the first use of those relations in the parse/rewrite/plan pipeline.
+ * For a partitioned table, we defer locking non-partitioned child tables
+ * to when we actually know that it will be scanned (see below that we
+ * use RelationGetPartitionDispatchInfo() to get the list of child tables
+ * of partitioned tables, not find_all_inheritors() which would lock the
+ * child tables.)
*
* If the parent relation is the query's result relation, then we need
* RowExclusiveLock. Otherwise, if it's accessed FOR UPDATE/SHARE, we
@@ -1425,7 +1440,8 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
lockmode = AccessShareLock;
/* Scan for all members of inheritance set, acquire needed locks */
- inhOIDs = find_all_inheritors(parentOID, lockmode, false, NULL, NULL);
+ inhOIDs = find_all_inheritors(parentOID, lockmode, true, NULL,
+ &num_partitioned_children);
/*
* Check that there's at least one descendant, else treat as no-child
@@ -1461,9 +1477,17 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
{
List *leaf_part_oids,
*ptinfos;
+ int rtable_length = list_length(parse->rtable),
+ i;
+
+ /*
+ * Keep leaf partition OIDs around so that we can lock them in this
+ * order when we eventually do it.
+ */
+ orig_leaf_part_oids = list_copy_tail(inhOIDs,
+ num_partitioned_children + 1);
- /* Discard the original list. */
- list_free(inhOIDs);
+ /* Discard the original inhOIDs list. */
inhOIDs = NIL;
/* Request partitioning information. */
@@ -1471,14 +1495,37 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
&leaf_part_oids);
/*
- * First collect the partitioned child table OIDs, which includes the
- * root parent at the head.
+ * We make a PartitionInfo object for every partitioned table in the
+ * tree, including the root table. We create the root table's
+ * PartitionInfo outside the loop, because we'd like to use its
+ * original RT index, whereas for the child partitioned tables, we'll
+ * use their to-be RT indexes.
*/
+ ptinfo = linitial(ptinfos);
+ pinfo = makeNode(PartitionInfo);
+ pinfo->relid = rti;
+ pinfo->pd = ptinfo->pd;
+ partition_infos = list_make1(pinfo);
+
+ /* Let there remain only the child tables' PartitionedTableInfo's */
+ ptinfos = list_delete_first(ptinfos);
+
+ /*
+ * First collect the partitioned child table OIDs. Note that the list
+ * won't contain the root table's OID because we removed its ptinfo
+ * from the list above.
+ */
+ i = 1;
foreach(l, ptinfos)
{
PartitionedTableInfo *ptinfo = lfirst(l);
+ PartitionInfo *pinfo = makeNode(PartitionInfo);
inhOIDs = lappend_oid(inhOIDs, ptinfo->relid);
+ pinfo->relid = rtable_length + i;
+ pinfo->pd = ptinfo->pd;
+ partition_infos = lappend(partition_infos, pinfo);
+ i++;
}
/* Concatenate the leaf partition OIDs. */
@@ -1487,7 +1534,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
appinfos = NIL;
- has_child = false;
foreach(l, inhOIDs)
{
Oid childOID = lfirst_oid(l);
@@ -1496,23 +1542,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
Index childRTindex;
AppendRelInfo *appinfo;
- /* Open rel if needed; we already have required locks */
- if (childOID != parentOID)
- newrelation = heap_open(childOID, NoLock);
- else
- newrelation = oldrelation;
-
/*
* It is possible that the parent table has children that are temp
* tables of other backends. We cannot safely access such tables
* (because of buffering issues), and the best thing to do seems to be
* to silently ignore them.
*/
- if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation))
- {
- heap_close(newrelation, lockmode);
+ if (childOID != parentOID && rel_is_other_temp(childOID))
continue;
- }
/*
* Build an RTE for the child, and attach to query's rangetable list.
@@ -1528,7 +1565,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
*/
childrte = copyObject(rte);
childrte->relid = childOID;
- childrte->relkind = newrelation->rd_rel->relkind;
+ childrte->relkind = get_rel_relkind(childOID);
childrte->inh = false;
childrte->requiredPerms = 0;
childrte->securityQuals = NIL;
@@ -1536,51 +1573,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
childRTindex = list_length(parse->rtable);
/*
- * Build an AppendRelInfo for this parent and child, unless the child
- * is a partitioned table.
- */
- if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
- {
- /* Remember if we saw a real child. */
- if (childOID != parentOID)
- has_child = true;
-
- appinfo = makeNode(AppendRelInfo);
- appinfo->parent_relid = rti;
- appinfo->child_relid = childRTindex;
- appinfo->parent_reltype = oldrelation->rd_rel->reltype;
- appinfo->child_reltype = newrelation->rd_rel->reltype;
- make_inh_translation_list(oldrelation, newrelation, childRTindex,
- &appinfo->translated_vars);
- appinfo->parent_reloid = parentOID;
- appinfos = lappend(appinfos, appinfo);
-
- /*
- * Translate the column permissions bitmaps to the child's attnums
- * (we have to build the translated_vars list before we can do
- * this). But if this is the parent table, leave copyObject's
- * result alone.
- *
- * Note: we need to do this even though the executor won't run any
- * permissions checks on the child RTE. The
- * insertedCols/updatedCols bitmaps may be examined for
- * trigger-firing purposes.
- */
- if (childOID != parentOID)
- {
- childrte->selectedCols = translate_col_privs(rte->selectedCols,
- appinfo->translated_vars);
- childrte->insertedCols = translate_col_privs(rte->insertedCols,
- appinfo->translated_vars);
- childrte->updatedCols = translate_col_privs(rte->updatedCols,
- appinfo->translated_vars);
- }
- }
- else
- partitioned_child_rels = lappend_int(partitioned_child_rels,
- childRTindex);
-
- /*
* Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.
*/
if (oldrc)
@@ -1604,12 +1596,78 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
*/
newrc->isParent = (childrte->relkind == RELKIND_PARTITIONED_TABLE);
- /* Include child's rowmark type in parent's allMarkTypes */
- oldrc->allMarkTypes |= newrc->allMarkTypes;
root->rowMarks = lappend(root->rowMarks, newrc);
}
+ /*
+ * No need to create AppendRelInfo for partitions at this point,
+ * because we don't know yet if it will actually be scanned by this
+ * query. The fact that this is a partition of the parent table
+ * will be recorded in the PartitionInfo created for the parent
+ * table.
+ */
+ if (rel_is_partition(childOID) &&
+ childrte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ LeafPartitionInfo *lpinfo = makeNode(LeafPartitionInfo);
+
+ lpinfo->reloid = childOID;
+ lpinfo->relid = childRTindex;
+ leaf_part_infos = lappend(leaf_part_infos, lpinfo);
+ continue;
+ }
+
+ if (childrte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ partitioned_child_rels = lappend_int(partitioned_child_rels,
+ childRTindex);
+ continue;
+ }
+
+ /*
+ * This must be a non-partitioned child table that is not a partition.
+ * Build an AppendRelInfo for the same to remember the parent-child
+ * relationship.
+ */
+
+ /* Open rel if needed, we already have required locks */
+ if (childOID != parentOID)
+ newrelation = heap_open(childOID, NoLock);
+ else
+ newrelation = oldrelation;
+
+ appinfo = makeNode(AppendRelInfo);
+ appinfo->parent_relid = rti;
+ appinfo->child_relid = childRTindex;
+ appinfo->parent_reltype = oldrelation->rd_rel->reltype;
+ appinfo->child_reltype = newrelation->rd_rel->reltype;
+ make_inh_translation_list(oldrelation, newrelation, childRTindex,
+ &appinfo->translated_vars);
+ appinfo->parent_reloid = parentOID;
+ appinfos = lappend(appinfos, appinfo);
+
+ /*
+ * Translate the column permissions bitmaps to the child's attnums
+ * (we have to build the translated_vars list before we can do
+ * this). But if this is the parent table, leave copyObject's
+ * result alone.
+ *
+ * Note: we need to do this even though the executor won't run any
+ * permissions checks on the child RTE. The
+ * insertedCols/updatedCols bitmaps may be examined for
+ * trigger-firing purposes.
+ */
+ if (childOID != parentOID)
+ {
+ childrte->selectedCols = translate_col_privs(rte->selectedCols,
+ appinfo->translated_vars);
+ childrte->insertedCols = translate_col_privs(rte->insertedCols,
+ appinfo->translated_vars);
+ childrte->updatedCols = translate_col_privs(rte->updatedCols,
+ appinfo->translated_vars);
+ }
+
/* Close child relations, but keep locks */
if (childOID != parentOID)
heap_close(newrelation, NoLock);
@@ -1618,35 +1676,53 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
heap_close(oldrelation, NoLock);
/*
- * If all the children were temp tables or a partitioned parent did not
- * have any leaf partitions, pretend it's a non-inheritance situation; we
- * don't need Append node in that case. The duplicate RTE we added for
- * the parent table is harmless, so we don't bother to get rid of it;
- * ditto for the useless PlanRowMark node.
+ * We keep a list of objects in root, each of which maps a partitioned
+ * parent RT index to a bunch of information about the partition tree
+ * rooted at that parent. The information includes a list of RT indexes
+ * of partitioned tables appearing in the tree, a list of PartitionInfo
+ * objects for each such partitioned table, a list of LeafPartitionInfo
+ * objects for each leaf partition in tree, and finally a list containing
+ * leaf partition OIDs in an order in which find_all_inheritors() returned
+ * them. The first of these is used when creating an Append or a
+ * ModifyTable path for the parent to be copied verbatim into the path
+ * (and subsequently the plan) so that it could be carried over to the
+ * executor. That list is the only place where the executor could find
+ * partitioned child tables to lock them.
*/
- if (!has_child)
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
- /* Clear flag before returning */
- rte->inh = false;
+ PartitionRootInfo *prinfo = makeNode(PartitionRootInfo);
+
+ Assert(list_length(partition_infos) >= 1);
+ prinfo->parent_relid = rti;
+ /*
+ * Be sure to include the parent's RT index, because the above code
+ * didn't.
+ */
+ prinfo->partitioned_relids = lcons_int(rti, partitioned_child_rels);
+ prinfo->partition_infos = partition_infos;
+ prinfo->leaf_part_infos = leaf_part_infos;
+ prinfo->orig_leaf_part_oids = orig_leaf_part_oids;
+
+ root->prinfo_list = lappend(root->prinfo_list, prinfo);
+
+ /*
+ * Our job here is done, because we didn't create any AppendRelInfos.
+ */
return;
}
/*
- * We keep a list of objects in root, each of which maps a partitioned
- * parent RT index to the list of RT indexes of its partitioned child
- * tables. When creating an Append or a ModifyTable path for the parent,
- * we copy the child RT index list verbatim to the path so that it could
- * be carried over to the executor so that the latter could identify the
- * partitioned child tables.
+ * If all the children were temp tables, pretend it's a non-inheritance
+ * situation; we don't need Append node in that case. The duplicate
+ * RTE we added for the parent table is harmless, so we don't bother to
+ * get rid of it; ditto for the useless PlanRowMark node.
*/
- if (partitioned_child_rels != NIL)
+ if (list_length(appinfos) < 2)
{
- pcinfo = makeNode(PartitionedChildRelInfo);
-
- Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
+ /* Clear flag before returning */
+ rte->inh = false;
+ return;
}
/* Otherwise, OK to add to root->append_rel_list */
@@ -1767,7 +1843,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
* query is really only going to reference the inherited columns. Instead
* we set the per-column bits for all inherited columns.
*/
-static Bitmapset *
+Bitmapset *
translate_col_privs(const Bitmapset *parent_privs,
List *translated_vars)
{
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index a1ebd4acc8..5607a4e4e0 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1577,6 +1577,50 @@ build_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
}
/*
+ * build_rel_vars
+ *
+ * Returns a list containing Var expressions corresponding to a relation's
+ * attributes. Since the caller may already have the RangeTblEntry, we it
+ * pass the same instead of PlannerInfo to avoid finding it in the range
+ * table all over again.
+ */
+List *
+build_rel_vars(RangeTblEntry *rte, Index relid)
+{
+ Relation relation;
+ AttrNumber attrno;
+ int numattrs;
+ List *result = NIL;
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* Assume we already have adequate lock */
+ relation = heap_open(rte->relid, NoLock);
+
+ numattrs = RelationGetNumberOfAttributes(relation);
+ for (attrno = 1; attrno <= numattrs; attrno++)
+ {
+ Form_pg_attribute att_tup = TupleDescAttr(relation->rd_att,
+ attrno - 1);
+
+ if (att_tup->attisdropped)
+ continue;
+
+ result = lappend(result,
+ makeVar(relid,
+ attrno,
+ att_tup->atttypid,
+ att_tup->atttypmod,
+ att_tup->attcollation,
+ 0));
+
+ }
+
+ heap_close(relation, NoLock);
+ return result;
+}
+
+/*
* build_index_tlist
*
* Build a targetlist representing the columns of the specified index.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8ad0b4a669..4cc32dea8d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,7 +16,9 @@
#include <limits.h>
+#include "catalog/pg_class.h"
#include "miscadmin.h"
+#include "nodes/relation.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -146,6 +148,15 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->baserestrict_min_security = UINT_MAX;
rel->joininfo = NIL;
rel->has_eclass_joins = false;
+ /* Set in build_simple_rel if rel is root partitioned table */
+ rel->num_parted = 0;
+ rel->partition_infos = NULL;
+ rel->num_leaf_parts = 0;
+ rel->leaf_part_infos = NULL;
+ /* Set in get_rel_partitions_recurse */
+ rel->live_partition_painfos = NIL;
+ /* Set in set_append_rel_size if rel is a partition. */
+ rel->root_parent_relid = 0;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -210,25 +221,73 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
list_length(rte->securityQuals));
/*
- * If this rel is an appendrel parent, recurse to build "other rel"
- * RelOptInfos for its children. They are "other rels" because they are
- * not in the main join tree, but we will need RelOptInfos to plan access
- * to them.
+ * If this rel is an appendrel parent, generate additional information
+ * based on whether the parent is a partitioned table or not. For
+ * regular parent tables, recurse to build "other rel" RelOptInfos for its
+ * children. They are "other rels" because they are not in the main join
+ * tree, but we will need RelOptInfos to plan access to them. For
+ * partitioned parent tables, we do not yet create "other rel" RelOptInfos
+ * for the children. Instead, we set up some informations that will be
+ * used in set_append_rel_size() to look up its partitions.
*/
if (rte->inh)
{
ListCell *l;
- foreach(l, root->append_rel_list)
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
- AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
+ PartitionRootInfo *prinfo = NULL;
+ LeafPartitionInfo **lpinfos;
+ int i;
+
+ foreach(l, root->prinfo_list)
+ {
+ prinfo = lfirst(l);
+ if (prinfo->parent_relid == relid)
+ break;
+ }
+ Assert(prinfo != NULL && prinfo->parent_relid == relid);
+
+ rel->num_parted = list_length(prinfo->partition_infos);
+ rel->num_leaf_parts = list_length(prinfo->leaf_part_infos);
+ rel->partition_infos = (PartitionInfo **)
+ palloc0(rel->num_parted *
+ sizeof(PartitionInfo *));
+ lpinfos = (LeafPartitionInfo **) palloc0(rel->num_leaf_parts *
+ sizeof(LeafPartitionInfo *));
+ i = 0;
+ foreach(l, prinfo->partition_infos)
+ {
+ rel->partition_infos[i++] = lfirst(l);
+ }
+ i = 0;
+ foreach(l, prinfo->leaf_part_infos)
+ {
+ lpinfos[i++] = lfirst(l);
+ }
+ rel->leaf_part_infos = lpinfos;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != relid)
- continue;
+ /*
+ * Don't build RelOptInfo for partitions yet; we don't know which
+ * ones we'll need. We did create RangeTblEntry's though, so we
+ * have an empty slot in root->simple_rel_array that will be
+ * filled eventually if the respective partition is chosen to be
+ * scanned after all.
+ */
+ }
+ else
+ {
+ foreach(l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid != relid)
+ continue;
- (void) build_simple_rel(root, appinfo->child_relid,
- rel);
+ (void) build_simple_rel(root, appinfo->child_relid,
+ rel);
+ }
}
}
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 82763f8013..ebbc3da985 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1817,6 +1817,28 @@ get_rel_relkind(Oid relid)
}
/*
+ * rel_is_partition
+ *
+ * Returns the relkind associated with a given relation.
+ */
+char
+rel_is_partition(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relispartition;
+ ReleaseSysCache(tp);
+
+ return result;
+}
+
+/*
* get_rel_tablespace
*
* Returns the pg_tablespace OID associated with a given relation.
@@ -1865,6 +1887,34 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * rel_is_other_temp
+ *
+ * Returns whether a relation is a temp table from another session
+ */
+bool
+rel_is_other_temp(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result = false;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+
+ if (reltup->relpersistence == RELPERSISTENCE_TEMP &&
+ !isTempOrTempToastNamespace(reltup->relnamespace))
+ {
+ result = true;
+ }
+
+ ReleaseSysCache(tp);
+
+ return result;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 7b53baf847..b5dcb22688 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -16,6 +16,7 @@
#include "fmgr.h"
#include "executor/tuptable.h"
#include "nodes/execnodes.h"
+#include "nodes/relation.h"
#include "parser/parse_node.h"
#include "utils/rel.h"
@@ -87,4 +88,7 @@ extern int get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos,
EState *estate,
PartitionTupleRoutingInfo **failed_at,
TupleTableSlot **failed_slot);
+
+/* Planner support stuff. */
+extern List *get_partitions_for_keys(PartitionDispatch pd);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 27bd4f3363..e957615ac6 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,10 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
+ T_PartitionInfo,
+ T_LeafPartitionInfo,
+ T_PartitionAppendInfo,
+ T_PartitionRootInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3ccc9d1b03..71c494a7c2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -251,7 +251,7 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
+ List *prinfo_list; /* list of PartitionRootInfos */
List *rowMarks; /* list of PlanRowMarks */
@@ -515,6 +515,9 @@ typedef enum RelOptKind
/* Is the given relation an "other" relation? */
#define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
+typedef struct PartitionInfo PartitionInfo;
+typedef struct LeafPartitionInfo LeafPartitionInfo;
+
typedef struct RelOptInfo
{
NodeTag type;
@@ -592,6 +595,23 @@ typedef struct RelOptInfo
/* used by "other" relations */
Relids top_parent_relids; /* Relids of topmost parents */
+
+ /* Fields set for "root" partitioned relations */
+ int num_parted; /* Number of entries in partition_infos */
+ PartitionInfo **partition_infos;
+ int num_leaf_parts; /* Number of entries in leaf_part_infos */
+ LeafPartitionInfo **leaf_part_infos; /* LeafPartitionInfos */
+
+ /* Fields set for partitioned relations (list of PartitionAppendInfo's) */
+ List *live_partition_painfos;
+
+ /* Fields set for partition otherrels */
+
+ /*
+ * RT index of the root partitioned table in the the partition tree of
+ * which this rel is a member.
+ */
+ Index root_parent_relid;
} RelOptInfo;
/*
@@ -2012,24 +2032,73 @@ typedef struct AppendRelInfo
Oid parent_reloid; /* OID of parent relation */
} AppendRelInfo;
+/* Forward declarations, to avoid including other headers */
+typedef struct PartitionDispatchData *PartitionDispatch;
+
+/*
+ * PartitionInfo - information about partitioning of one partitioned table in
+ * a given partition tree
+ */
+typedef struct PartitionInfo
+{
+ NodeTag type;
+
+ Index relid; /* Ordinal position in the rangetable */
+ PartitionDispatch pd; /* Information about partitions */
+} PartitionInfo;
+
+/*
+ * LeafPartitionInfo - (OID, RT index) pair for one leaf partition
+ *
+ * Created when a leaf partition's RT entry is created in
+ * expand_inherited_rtentry().
+ */
+typedef struct LeafPartitionInfo
+{
+ NodeTag type;
+
+ Oid reloid; /* OID */
+ Index relid; /* RT index */
+} LeafPartitionInfo;
+
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
+ * PartitionAppendInfo - list of child RT indexes for one partitioned table
+ * in a given partition tree
+ */
+typedef struct PartitionAppendInfo
+{
+ NodeTag type;
+
+ Index parent_relid;
+ List *live_partition_relids; /* List of RT indexes */
+} PartitionAppendInfo;
+
+/*
+ * For a partitioned table, this maps its RT index to the information about
+ * the partition tree collected in expand_inherited_rtentry().
+ *
+ * That information includes a list of PartitionInfo nodes, one for each
+ * partitioned table in the partition tree, including for the table itself.
+ * Also included is a list of RT indexes of the entries for leaf partitions
+ * that are created at the same time by expand_inherited_rtentry().
+ *
+ * orig_leaf_part_oids contains the list of leaf partition OIDs as it was
+ * generated by find_all_inheritors(). We keep it around so that we can
+ * lock leaf partitions in that order when we actually do it.
*
- * These structs are kept in the PlannerInfo node's pcinfo_list.
+ * PartitionRootInfo's for different partitioned tables in a query are placed
+ * in root->prinfo_list.
*/
-typedef struct PartitionedChildRelInfo
+typedef struct PartitionRootInfo
{
NodeTag type;
Index parent_relid;
- List *child_rels;
-} PartitionedChildRelInfo;
+ List *partition_infos;
+ List *partitioned_relids;
+ List *leaf_part_infos;
+ List *orig_leaf_part_oids;
+} PartitionRootInfo;
/*
* For each distinct placeholder expression generated during planning, we
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 71f0faf938..1e18f609b1 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -39,6 +39,7 @@ extern bool relation_excluded_by_constraints(PlannerInfo *root,
RelOptInfo *rel, RangeTblEntry *rte);
extern List *build_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
+extern List *build_rel_vars(RangeTblEntry *rte, Index relid);
extern bool has_unique_index(RelOptInfo *rel, AttrNumber attno);
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 4be0afd566..d0af8dc7bc 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -16,6 +16,7 @@
#include "nodes/plannodes.h"
#include "nodes/relation.h"
+#include "utils/rel.h"
/*
@@ -51,6 +52,8 @@ extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
extern RelOptInfo *plan_set_operations(PlannerInfo *root);
extern void expand_inherited_tables(PlannerInfo *root);
+extern Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
+ List *translated_vars);
extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
int nappinfos, AppendRelInfo **appinfos);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 07208b56ce..b5b615a6fa 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -126,8 +126,10 @@ extern char *get_rel_name(Oid relid);
extern Oid get_rel_namespace(Oid relid);
extern Oid get_rel_type_id(Oid relid);
extern char get_rel_relkind(Oid relid);
+extern bool rel_is_partition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool rel_is_other_temp(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index a2d9469592..e159d62b66 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -278,12 +278,12 @@ select tableoid::regclass, * from list_parted;
-------------+----+----
part_aa_bb | aA |
part_cc_dd | cC | 1
- part_null | | 0
- part_null | | 1
part_ee_ff1 | ff | 1
part_ee_ff1 | EE | 1
part_ee_ff2 | ff | 11
part_ee_ff2 | EE | 10
+ part_null | | 0
+ part_null | | 1
(8 rows)
-- some more tests to exercise tuple-routing with multi-level partitioning
--
2.11.0
On Mon, Aug 21, 2017 at 12:07 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I've been working on implementing a way to perform plan-time
partition-pruning that is hopefully faster than the current method of
using constraint exclusion to prune each of the potentially many
partitions one-by-one. It's not fully cooked yet though.Meanwhile, I thought I'd share a couple of patches that implement some
restructuring of the planner code related to partitioned table inheritance
planning that I think would be helpful. They are to be applied on top of
the patches being discussed at [1]. Note that these patches themselves
don't implement the actual code that replaces constraint exclusion as a
method of performing partition pruning. I will share that patch after
debugging it some more.The main design goal of the patches I'm sharing here now is to defer the
locking and opening of leaf partitions in a given partition tree to a
point after set_append_rel_size() is called on the root partitioned table.
Currently, AFAICS, we need to lock and open the child tables in
expand_inherited_rtentry() only to set the translated_vars field in
AppendRelInfo that we create for the child. ISTM, we can defer the
creation of a child AppendRelInfo to a point when it (its field
translated_vars in particular) will actually be used and so lock and open
the child tables only at such a time. Although we don't lock and open the
partition child tables in expand_inherited_rtentry(), their RT entries are
still created and added to root->parse->rtable, so that
setup_simple_rel_arrays() knows the maximum number of entries
root->simple_rel_array will need to hold and allocate the memory for that
array accordingly. Slots in simple_rel_array[] corresponding to
partition child tables will be empty until they are created when
set_append_rel_size() is called on the root parent table and it determines
the partitions that will be scanned after all.
The partition pruning can happen only after the quals have been
distributed to Rels i.e. after deconstruct_jointree(),
reconsider_outer_join_clauses() and generate_base_implied_equalities()
have been called. If the goal is to not heap_open() the partitions
which are pruned, we can't do that in expand_inherited_rtentry(). One
reason why I think we don't want to heap_open() partition relations is
to avoid relcache bloat because of opened partition relations, which
are ultimately pruned. But please note that according to your patches,
we still need to populate catalog caches to get relkind and reltype
etc.
There are many functions that traverse simple_rel_array[] after it's
created. Most of them assume that the empty entries in that array
correspond to non-simple range entries like join RTEs. But now we are
breaking that assumption. Most of these functions also skip "other"
relations, so that may be OK now, but I am not sure if it's really
going to be fine if we keep empty slots in place of partition
relations. There may be three options here 1. add placeholder
RelOptInfos for partition relations (may be mark those specially) and
mark the ones which get pruned as dummy later. 2. Prune the partitions
before any functions scans simple_rel_array[] or postpone creating
simple_rel_array till pruning. 3. Examine all the current scanners
esp. the ones which will be called before pruning to make sure that
skipping "other" relations is going to be kosher.
Patch augments the existing PartitionedChildRelInfo node, which currently
holds only the partitioned child rel RT indexes, to carry some more
information about the partition tree, which includes the information
returned by RelationGetPartitionDispatchInfo() when it is called from
expand_inherited_rtentry() (per the proposed patch in [1], we call it to
be able to add partitions to the query tree in the bound order).
Actually, since PartitionedChildRelInfo now contains more information
about the partition tree than it used to before, I thought the struct's
name is no longer relevant, so renamed it to PartitionRootInfo and renamed
root->pcinfo_list accordingly to prinfo_list. That seems okay because we
only use that node internally.Then during the add_base_rels_to_query() step, when build_simple_rel()
builds a RelOptInfo for the root partitioned table, it also initializes
some newly introduced fields in RelOptInfo from the information contained
in PartitionRootInfo of the table. The aforementioned fields are only
initialized in RelOptInfos of root partitioned tables. Note that the
add_base_rels_to_query() step won't add the partition "otherrel"
RelOptInfos yet (unlike the regular inheritance case, where they are,
after looking them up in root->append_rel_list).
Partition-wise join requires the partition hierarchy to be expanded
level-by-level keeping in-tact the parent-child relationship between
partitioned table and its partitions. Your patch doesn't do that and
adds all the partitioning information in the root partitioned table's
RelOptInfo. OTOH, partition-wise join patch adds partition bounds, key
expressions, OID and RelOptInfos of the immediate partitions
(including partitioned partitions) to RelOptInfo of a partitioned
table (see patch 0002 in the latest set of patches at [1]/messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com). I don't
see much point in having conflicting changes in both of our patches.
May be you should review that patch from my set and we can find a set
of members which help both partition pruning and partition-wise join.
When set_append_rel_size() is called on the root partitioned table, it
will call a find_partitions_for_query(), which using the partition tree
information, determines the partitions that will need to be scanned for
the query. This processing happens recursively, that is, we first
determine the root-parent's partitions and then for each partition that's
partitioned, we will determine its partitions and so on. As we determine
partitions in this per-partitioned-table manner, we maintain a pair
(parent_relid, list-of-partition-relids-to-scan) for each partitioned
table and also a single list of all leaf partitions determined so far.
Once all partitions have been determined, we turn to locking the leaf
partitions. The locking happens in the order of OIDs as
find_all_inheritors would have returned in expand_inherited_rtentry(); the
list of OIDs in that original order is also stored in the table's
PartitionRootInfo node. For each OID in that list, check if that OID is
in the set of leaf partition OIDs that was just computed, and if so, lock
it. For all chosen partitions that are partitioned tables (including the
root), we create a PartitionAppendInfo node which stores the
aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
append it to a list in the root table's RelOptInfo, with the root table's
PartitionAppendInfo at the head of the list. Note that the list of
partitions in this pair contains only the immediate partitions, so that
the original parent-child relationship is reflected in the list of
PartitionAppendInfos thus collected. The next patch that will implement
actual partition-pruning will add some more code that will run under
find_partitions_for_query().set_append_rel_size() processing then continues for the root partitioned
table. It is at this point that we will create the RelOptInfos and
AppendRelInfos for partitions. First for those of the root partitioned
table and then for those of each partitioned table when
set_append_rel_size() will be recursively called for the latter.
set_append_rel_size(), set_append_rel_pathlist() are already
recursive, so if we process expansion and pruning for one level in
those functions, the recursion will automatically take care of doing
so for every level.
Note that this is still largely a WIP patch and the implementation details
might change per both the feedback here and the discussion at [1].
The changes to code which handle expansion in this patch set should
really be part of expansion in bound order thread so that it's easy to
review all changes together. And this thread can then only concentrate
on partition pruning.
[1]: /messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Ashutosh,
Thanks for the comments and sorry that it took me a while to reply here.
On 2017/08/23 20:16, Ashutosh Bapat wrote:
On Mon, Aug 21, 2017 at 12:07 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I've been working on implementing a way to perform plan-time
partition-pruning that is hopefully faster than the current method of
using constraint exclusion to prune each of the potentially many
partitions one-by-one. It's not fully cooked yet though.Meanwhile, I thought I'd share a couple of patches that implement some
restructuring of the planner code related to partitioned table inheritance
planning that I think would be helpful. They are to be applied on top of
the patches being discussed at [1]. Note that these patches themselves
don't implement the actual code that replaces constraint exclusion as a
method of performing partition pruning. I will share that patch after
debugging it some more.The main design goal of the patches I'm sharing here now is to defer the
locking and opening of leaf partitions in a given partition tree to a
point after set_append_rel_size() is called on the root partitioned table.
Currently, AFAICS, we need to lock and open the child tables in
expand_inherited_rtentry() only to set the translated_vars field in
AppendRelInfo that we create for the child. ISTM, we can defer the
creation of a child AppendRelInfo to a point when it (its field
translated_vars in particular) will actually be used and so lock and open
the child tables only at such a time. Although we don't lock and open the
partition child tables in expand_inherited_rtentry(), their RT entries are
still created and added to root->parse->rtable, so that
setup_simple_rel_arrays() knows the maximum number of entries
root->simple_rel_array will need to hold and allocate the memory for that
array accordingly. Slots in simple_rel_array[] corresponding to
partition child tables will be empty until they are created when
set_append_rel_size() is called on the root parent table and it determines
the partitions that will be scanned after all.The partition pruning can happen only after the quals have been
distributed to Rels i.e. after deconstruct_jointree(),
reconsider_outer_join_clauses() and generate_base_implied_equalities()
have been called. If the goal is to not heap_open() the partitions
which are pruned, we can't do that in expand_inherited_rtentry(). One
reason why I think we don't want to heap_open() partition relations is
to avoid relcache bloat because of opened partition relations, which
are ultimately pruned. But please note that according to your patches,
we still need to populate catalog caches to get relkind and reltype
etc.
Yes, we still hit syscache for *all* partitions. I haven't yet thought
very hard about avoiding that altogether.
There are many functions that traverse simple_rel_array[] after it's
created. Most of them assume that the empty entries in that array
correspond to non-simple range entries like join RTEs. But now we are
breaking that assumption. Most of these functions also skip "other"
relations, so that may be OK now, but I am not sure if it's really
going to be fine if we keep empty slots in place of partition
relations. There may be three options here 1. add placeholder
RelOptInfos for partition relations (may be mark those specially) and
mark the ones which get pruned as dummy later. 2. Prune the partitions
before any functions scans simple_rel_array[] or postpone creating
simple_rel_array till pruning. 3. Examine all the current scanners
esp. the ones which will be called before pruning to make sure that
skipping "other" relations is going to be kosher.
Between the point when slots in simple_rel_array are allocated
(setup_simple_rel_arrays) and partition RelOptInfos are actually created
after the partition-pruning step has occurred (set_append_rel_size), it
seems that most places that iterate over simple_rel_array know also to
skip slots containing NULL values. We might need to document that NULL
means partitions in addition to its current meaning - non-baserels.
Patch augments the existing PartitionedChildRelInfo node, which currently
holds only the partitioned child rel RT indexes, to carry some more
information about the partition tree, which includes the information
returned by RelationGetPartitionDispatchInfo() when it is called from
expand_inherited_rtentry() (per the proposed patch in [1], we call it to
be able to add partitions to the query tree in the bound order).
Actually, since PartitionedChildRelInfo now contains more information
about the partition tree than it used to before, I thought the struct's
name is no longer relevant, so renamed it to PartitionRootInfo and renamed
root->pcinfo_list accordingly to prinfo_list. That seems okay because we
only use that node internally.Then during the add_base_rels_to_query() step, when build_simple_rel()
builds a RelOptInfo for the root partitioned table, it also initializes
some newly introduced fields in RelOptInfo from the information contained
in PartitionRootInfo of the table. The aforementioned fields are only
initialized in RelOptInfos of root partitioned tables. Note that the
add_base_rels_to_query() step won't add the partition "otherrel"
RelOptInfos yet (unlike the regular inheritance case, where they are,
after looking them up in root->append_rel_list).Partition-wise join requires the partition hierarchy to be expanded
level-by-level keeping in-tact the parent-child relationship between
partitioned table and its partitions. Your patch doesn't do that and
adds all the partitioning information in the root partitioned table's
RelOptInfo. OTOH, partition-wise join patch adds partition bounds, key
expressions, OID and RelOptInfos of the immediate partitions
(including partitioned partitions) to RelOptInfo of a partitioned
table (see patch 0002 in the latest set of patches at [1]). I don't
see much point in having conflicting changes in both of our patches.
May be you should review that patch from my set and we can find a set
of members which help both partition pruning and partition-wise join.
Yes, I think it would be a good idea for the partition-pruning patch to
initialize those fields in the individual parents' RelOptInfos. I will
review relevant patches in the partitionwise-join thread to see what can
be incorporated here.
When set_append_rel_size() is called on the root partitioned table, it
will call a find_partitions_for_query(), which using the partition tree
information, determines the partitions that will need to be scanned for
the query. This processing happens recursively, that is, we first
determine the root-parent's partitions and then for each partition that's
partitioned, we will determine its partitions and so on. As we determine
partitions in this per-partitioned-table manner, we maintain a pair
(parent_relid, list-of-partition-relids-to-scan) for each partitioned
table and also a single list of all leaf partitions determined so far.
Once all partitions have been determined, we turn to locking the leaf
partitions. The locking happens in the order of OIDs as
find_all_inheritors would have returned in expand_inherited_rtentry(); the
list of OIDs in that original order is also stored in the table's
PartitionRootInfo node. For each OID in that list, check if that OID is
in the set of leaf partition OIDs that was just computed, and if so, lock
it. For all chosen partitions that are partitioned tables (including the
root), we create a PartitionAppendInfo node which stores the
aforementioned pair (parent_relid, list-of-partitions-relids-to-scan), and
append it to a list in the root table's RelOptInfo, with the root table's
PartitionAppendInfo at the head of the list. Note that the list of
partitions in this pair contains only the immediate partitions, so that
the original parent-child relationship is reflected in the list of
PartitionAppendInfos thus collected. The next patch that will implement
actual partition-pruning will add some more code that will run under
find_partitions_for_query().set_append_rel_size() processing then continues for the root partitioned
table. It is at this point that we will create the RelOptInfos and
AppendRelInfos for partitions. First for those of the root partitioned
table and then for those of each partitioned table when
set_append_rel_size() will be recursively called for the latter.set_append_rel_size(), set_append_rel_pathlist() are already
recursive, so if we process expansion and pruning for one level in
those functions, the recursion will automatically take care of doing
so for every level.
My only worry about that is locking order of leaf partitions will be
different from concurrent backends if we lock them in an order dictated by
traversing the partition tree level at a time. Because such traversal
will presumably proceed in the partition bound order.
The patch computes *all* leaf partitions that will need to be scanned by
the query (after pruning needless ones) and lock the chosen partitions in
the original order (it keeps the original order OID list generated by
find_all_inheritors around for this purpose). While computing the leaf
partitions, it remembers immediate parent-child relationships in the
process. The way it does it is by processing the partition tree in a
recursive depth-first manner, and in each recursive step, creating a
PartitionAppendInfo that maps a parent table to its immediate children
(only those that will satisfy the query). Once the PartitionAppendInfo's
for all the parents in the partition tree have been computed, we resume
the set_append_rel_size() processing, which takes the root parent's
PartitionAppendInfo and builds RelOptInfos and AppendRelInfos for its
immediate children. Its children that are partitioned tables themselves
will recursively call set_append_rel_size() and look up its own
PartitionAppendInfo and create RelOptInfos and AppendRelInfos for its
children and so on.
Note that this is still largely a WIP patch and the implementation details
might change per both the feedback here and the discussion at [1].The changes to code which handle expansion in this patch set should
really be part of expansion in bound order thread so that it's easy to
review all changes together. And this thread can then only concentrate
on partition pruning.
I think I agree. I'm posting today the patches that actually implement
partition-pruning. The previous patches do seem to belong on the EIBO
thread, but will post them together here today for convenience of being
able to apply them to HEAD and try out. Now that Robert has posted a
patch to implement depth-first EIBO, I will have to find a way to rebase
the actual partition-pruning patches on this thread so that its core logic
works at all by finding the information it needs.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/08/21 15:37, Amit Langote wrote:
Meanwhile, I thought I'd share a couple of patches that implement some
restructuring of the planner code related to partitioned table inheritance
planning that I think would be helpful. They are to be applied on top of
the patches being discussed at [1]. Note that these patches themselves
don't implement the actual code that replaces constraint exclusion as a
method of performing partition pruning. I will share that patch after
debugging it some more.The next patch that will implement
actual partition-pruning will add some more code that will run under
find_partitions_for_query().
Attached is now also the set of patches that implement the actual
partition-pruning logic, viz. the last 3 patches (0004, 0005, and 0006) of
the attached.
Because the patch helps avoid performing constraint exclusion on *all*
partitions for a given query, one might expect this to improve performance
for queries on partitioned tables and scale to a fairly large number of
partitions. Here are some numbers for the partitioned table and the query
shown below:
\d+ ptab
Columns: (a date, b int, c text)
Partition key: RANGE (a, b)
Partitions:
ptab_00001 FOR VALUES FROM ('2017-08-31', 1) TO ('2017-08-31', 1000),
ptab_00002 FOR VALUES FROM ('2017-08-31', 1000) TO ('2017-08-31', 2000),
ptab_00003 FOR VALUES FROM ('2017-08-31', 2000) TO ('2017-08-31', 3000),
ptab_00004 FOR VALUES FROM ('2017-08-31', 3000) TO ('2017-08-31', 4000),
ptab_00005 FOR VALUES FROM ('2017-08-31', 4000) TO ('2017-08-31', 5000),
ptab_00006 FOR VALUES FROM ('2017-09-01', 1) TO ('2017-09-01', 1000),
ptab_00007 FOR VALUES FROM ('2017-09-01', 1000) TO ('2017-09-01', 2000),
...
ptab_NNNNN FOR VALUES FROM (..., 4000) TO (..., 5000),
A query that prunes all partitions (empty result!):
explain select * from ptab where a < '2017-08-31';
Comparison of the average response times (in milliseconds) after running
the same query 100 times using pgbench against the database:
#: Number of partitions of ptab
c_e: Constraint exclusion
f_p: Fast pruning
# c_e f_p
===== ===== ====
10 0.7 0.4
50 1.8 0.6
100 3.2 0.8
500 16.8 2.7
1000 36.2 5.0
2000 79.7 10.2
5000 214.7 27.0
10000 443.6 64.8
For each partitioned table in a given partition tree (provided it is not
pruned to begin with), the query's clauses are matched to its partition
key and from the matched clauses, a pair of bounding keys (Datum tuples
with <= key->partnatts values for possibly a prefix of a multi-column key)
is generated. They are passed to partition.c: get_partitions_for_keys()
as Datum *minkeys and Datum *maxkeys. A list of partitions covering that
key range is returned. When generating that list, whether a particular
scan key is inclusive or not is considered along with the partitioning
strategy. It should be possible to support hash partitioning with
(hopefully) minimal changes to get_partitions_for_keys().
There are still certain limitations on the planner side of things:
1. OR clauses are not counted as contributing toward bounding scan keys;
currently only OpExprs and NullTests are recognized, so an OR clause
would get skipped from consideration when generating the bounding keys
to pass to partition.c
2. Redundant clauses are not appropriately pre-processed; so if a query
contains a = 10 and a > 1, the latter clause will be matched and
partitions holding values with a > 1 and a < 10 will not be pruned,
even if none of their rows will pass the query's condition
Fixing these limitations, adding more regression tests and implementing
some of the things suggested by Ashutosh Bapat [1]/messages/by-id/CAFjFpRdb_fkmJHFjvAbB+Ln0t45fWjekLd5pY=sv+eAhBAKXPQ@mail.gmail.com to prevent conflicting
changes with some preparatory patches in the partitionwise-join patch
series [2]/messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com are TODOs.
Adding this to CF-201709 as "faster partition pruning in planner".
To try out the attached patches: apply the patches posted at [3]/messages/by-id/2124e99f-9528-0f71-4e10-ac7974dd7077@lab.ntt.co.jp on HEAD
and then apply these
Thanks,
Amit
[1]: /messages/by-id/CAFjFpRdb_fkmJHFjvAbB+Ln0t45fWjekLd5pY=sv+eAhBAKXPQ@mail.gmail.com
/messages/by-id/CAFjFpRdb_fkmJHFjvAbB+Ln0t45fWjekLd5pY=sv+eAhBAKXPQ@mail.gmail.com
[2]: /messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com
/messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com
[3]: /messages/by-id/2124e99f-9528-0f71-4e10-ac7974dd7077@lab.ntt.co.jp
/messages/by-id/2124e99f-9528-0f71-4e10-ac7974dd7077@lab.ntt.co.jp
Attachments:
0001-Teach-pg_inherits.c-a-bit-about-partitioning.patchtext/plain; charset=UTF-8; name=0001-Teach-pg_inherits.c-a-bit-about-partitioning.patchDownload
From 1da6477fe698ddd8123d54274fb327dccb559043 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 8 Aug 2017 18:42:30 +0900
Subject: [PATCH 1/6] Teach pg_inherits.c a bit about partitioning
Both find_inheritance_children and find_all_inheritors now list
partitioned child tables before non-partitioned ones and return
the number of partitioned tables in an optional output argument
We also now store in pg_inherits, when adding a new child, if the
child is a partitioned table.
Per design idea from Robert Haas
---
contrib/sepgsql/dml.c | 2 +-
doc/src/sgml/catalogs.sgml | 10 +++
src/backend/catalog/partition.c | 2 +-
src/backend/catalog/pg_inherits.c | 157 ++++++++++++++++++++++++++-------
src/backend/commands/analyze.c | 3 +-
src/backend/commands/lockcmds.c | 2 +-
src/backend/commands/publicationcmds.c | 2 +-
src/backend/commands/tablecmds.c | 56 +++++++-----
src/backend/commands/vacuum.c | 3 +-
src/backend/executor/execMain.c | 3 +-
src/backend/optimizer/prep/prepunion.c | 2 +-
src/include/catalog/pg_inherits.h | 20 ++++-
src/include/catalog/pg_inherits_fn.h | 5 +-
13 files changed, 200 insertions(+), 67 deletions(-)
diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c
index b643720e36..6fc279805c 100644
--- a/contrib/sepgsql/dml.c
+++ b/contrib/sepgsql/dml.c
@@ -333,7 +333,7 @@ sepgsql_dml_privileges(List *rangeTabls, bool abort_on_violation)
if (!rte->inh)
tableIds = list_make1_oid(rte->relid);
else
- tableIds = find_all_inheritors(rte->relid, NoLock, NULL);
+ tableIds = find_all_inheritors(rte->relid, NoLock, NULL, NULL);
foreach(li, tableIds)
{
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ef7054cf26..00ba2906c2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3894,6 +3894,16 @@ SCRAM-SHA-256$<replaceable><iteration count></>:<replaceable><salt><
inherited columns are to be arranged. The count starts at 1.
</entry>
</row>
+
+ <row>
+ <entry><structfield>inhchildpartitioned</structfield></entry>
+ <entry><type>bool</type></entry>
+ <entry></entry>
+ <entry>
+ This is <literal>true</> if the child table is a partitioned table,
+ <literal>false</> otherwise
+ </entry>
+ </row>
</tbody>
</tgroup>
</table>
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index c92756ecd5..fe8e60de14 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -178,7 +178,7 @@ RelationBuildPartitionDesc(Relation rel)
return;
/* Get partition oids from pg_inherits */
- inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
+ inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, NULL);
/* Collect bound spec nodes in a list */
i = 0;
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 245a374fc9..5292ec8058 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -33,6 +33,8 @@
#include "utils/syscache.h"
#include "utils/tqual.h"
+static int32 inhchildinfo_cmp(const void *p1, const void *p2);
+
/*
* Entry of a hash table used in find_all_inheritors. See below.
*/
@@ -42,6 +44,30 @@ typedef struct SeenRelsEntry
ListCell *numparents_cell; /* corresponding list cell */
} SeenRelsEntry;
+/* Information about one inheritance child table. */
+typedef struct InhChildInfo
+{
+ Oid relid;
+ bool is_partitioned;
+} InhChildInfo;
+
+#define OID_CMP(o1, o2) \
+ ((o1) < (o2) ? -1 : ((o1) > (o2) ? 1 : 0));
+
+static int32
+inhchildinfo_cmp(const void *p1, const void *p2)
+{
+ InhChildInfo c1 = *((const InhChildInfo *) p1);
+ InhChildInfo c2 = *((const InhChildInfo *) p2);
+
+ if (c1.is_partitioned && !c2.is_partitioned)
+ return -1;
+ if (!c1.is_partitioned && c2.is_partitioned)
+ return 1;
+
+ return OID_CMP(c1.relid, c2.relid);
+}
+
/*
* find_inheritance_children
*
@@ -54,7 +80,8 @@ typedef struct SeenRelsEntry
* against possible DROPs of child relations.
*/
List *
-find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
+find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+ int *num_partitioned_children)
{
List *list = NIL;
Relation relation;
@@ -62,9 +89,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
ScanKeyData key[1];
HeapTuple inheritsTuple;
Oid inhrelid;
- Oid *oidarr;
- int maxoids,
- numoids,
+ InhChildInfo *inhchildren;
+ int maxchildren,
+ numchildren,
+ my_num_partitioned_children,
i;
/*
@@ -77,9 +105,10 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
/*
* Scan pg_inherits and build a working array of subclass OIDs.
*/
- maxoids = 32;
- oidarr = (Oid *) palloc(maxoids * sizeof(Oid));
- numoids = 0;
+ maxchildren = 32;
+ inhchildren = (InhChildInfo *) palloc(maxchildren * sizeof(InhChildInfo));
+ numchildren = 0;
+ my_num_partitioned_children = 0;
relation = heap_open(InheritsRelationId, AccessShareLock);
@@ -93,34 +122,45 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
while ((inheritsTuple = systable_getnext(scan)) != NULL)
{
- inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
- if (numoids >= maxoids)
+ Form_pg_inherits form = (Form_pg_inherits) GETSTRUCT(inheritsTuple);
+
+ if (numchildren >= maxchildren)
{
- maxoids *= 2;
- oidarr = (Oid *) repalloc(oidarr, maxoids * sizeof(Oid));
+ maxchildren *= 2;
+ inhchildren = (InhChildInfo *) repalloc(inhchildren,
+ maxchildren * sizeof(InhChildInfo));
}
- oidarr[numoids++] = inhrelid;
+ inhchildren[numchildren].relid = form->inhrelid;
+ inhchildren[numchildren].is_partitioned = form->inhpartitioned;
+
+ if (form->inhpartitioned)
+ my_num_partitioned_children++;
+ numchildren++;
}
systable_endscan(scan);
heap_close(relation, AccessShareLock);
+ if (num_partitioned_children)
+ *num_partitioned_children = my_num_partitioned_children;
+
/*
* If we found more than one child, sort them by OID. This ensures
* reasonably consistent behavior regardless of the vagaries of an
* indexscan. This is important since we need to be sure all backends
* lock children in the same order to avoid needless deadlocks.
*/
- if (numoids > 1)
- qsort(oidarr, numoids, sizeof(Oid), oid_cmp);
+ if (numchildren > 1)
+ qsort(inhchildren, numchildren, sizeof(InhChildInfo),
+ inhchildinfo_cmp);
/*
* Acquire locks and build the result list.
*/
- for (i = 0; i < numoids; i++)
+ for (i = 0; i < numchildren; i++)
{
- inhrelid = oidarr[i];
+ inhrelid = inhchildren[i].relid;
if (lockmode != NoLock)
{
@@ -144,7 +184,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
list = lappend_oid(list, inhrelid);
}
- pfree(oidarr);
+ pfree(inhchildren);
return list;
}
@@ -159,19 +199,30 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode)
* given rel.
*
* The specified lock type is acquired on all child relations (but not on the
- * given rel; caller should already have locked it). If lockmode is NoLock
- * then no locks are acquired, but caller must beware of race conditions
- * against possible DROPs of child relations.
+ * given rel; caller should already have locked it), unless
+ * lock_only_partitioned_children is specified, in which case, only the
+ * child relations that are partitioned tables are locked. If lockmode is
+ * NoLock then no locks are acquired, but caller must beware of race
+ * conditions against possible DROPs of child relations.
+ *
+ * Returned list of OIDs is such that all the partitioned tables in the tree
+ * appear at the head of the list. If num_partitioned_children is non-NULL,
+ * *num_partitioned_children returns the number of partitioned child table
+ * OIDs at the head of the list.
*/
List *
-find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
+find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
+ List **numparents, int *num_partitioned_children)
{
/* hash table for O(1) rel_oid -> rel_numparents cell lookup */
HTAB *seen_rels;
HASHCTL ctl;
List *rels_list,
- *rel_numparents;
+ *rel_numparents,
+ *partitioned_rels_list,
+ *other_rels_list;
ListCell *l;
+ int my_num_partitioned_children;
memset(&ctl, 0, sizeof(ctl));
ctl.keysize = sizeof(Oid);
@@ -185,31 +236,69 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
/*
* We build a list starting with the given rel and adding all direct and
- * indirect children. We can use a single list as both the record of
- * already-found rels and the agenda of rels yet to be scanned for more
- * children. This is a bit tricky but works because the foreach() macro
- * doesn't fetch the next list element until the bottom of the loop.
+ * indirect children. We can use a single list (rels_list) as both the
+ * record of already-found rels and the agenda of rels yet to be scanned
+ * for more children. This is a bit tricky but works because the foreach()
+ * macro doesn't fetch the next list element until the bottom of the loop.
+ *
+ * partitioned_child_rels will contain the OIDs of the partitioned child
+ * tables and other_rels_list will contain the OIDs of the non-partitioned
+ * child tables. Result list will be generated by concatening the two
+ * lists together with partitioned_child_rels appearing first.
*/
rels_list = list_make1_oid(parentrelId);
+ partitioned_rels_list = list_make1_oid(parentrelId);
+ other_rels_list = NIL;
rel_numparents = list_make1_int(0);
+ my_num_partitioned_children = 0;
+
foreach(l, rels_list)
{
Oid currentrel = lfirst_oid(l);
List *currentchildren;
- ListCell *lc;
+ ListCell *lc,
+ *first_nonpartitioned_child;
+ int cur_num_partitioned_children = 0,
+ i;
/* Get the direct children of this rel */
- currentchildren = find_inheritance_children(currentrel, lockmode);
+ currentchildren = find_inheritance_children(currentrel, lockmode,
+ &cur_num_partitioned_children);
+
+ my_num_partitioned_children += cur_num_partitioned_children;
+
+ /*
+ * Append partitioned children to rels_list and partitioned_rels_list.
+ * We know for sure that partitioned children don't need the
+ * the de-duplication logic in the following loop, because partitioned
+ * tables are not allowed to partiticipate in multiple inheritance.
+ */
+ i = 0;
+ foreach(lc, currentchildren)
+ {
+ if (i < cur_num_partitioned_children)
+ {
+ Oid child_oid = lfirst_oid(lc);
+
+ rels_list = lappend_oid(rels_list, child_oid);
+ partitioned_rels_list = lappend_oid(partitioned_rels_list,
+ child_oid);
+ }
+ else
+ break;
+ i++;
+ }
+ first_nonpartitioned_child = lc;
/*
* Add to the queue only those children not already seen. This avoids
* making duplicate entries in case of multiple inheritance paths from
* the same parent. (It'll also keep us from getting into an infinite
* loop, though theoretically there can't be any cycles in the
- * inheritance graph anyway.)
+ * inheritance graph anyway.) Also, add them to the other_rels_list.
*/
- foreach(lc, currentchildren)
+ for_each_cell(lc, first_nonpartitioned_child)
{
Oid child_oid = lfirst_oid(lc);
bool found;
@@ -225,6 +314,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
{
/* if it's not there, add it. expect 1 parent, initially. */
rels_list = lappend_oid(rels_list, child_oid);
+ other_rels_list = lappend_oid(other_rels_list, child_oid);
rel_numparents = lappend_int(rel_numparents, 1);
hash_entry->numparents_cell = rel_numparents->tail;
}
@@ -237,8 +327,13 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode, List **numparents)
list_free(rel_numparents);
hash_destroy(seen_rels);
+ list_free(rels_list);
+
+ if (num_partitioned_children)
+ *num_partitioned_children = my_num_partitioned_children;
- return rels_list;
+ /* List partitioned child tables before non-partitioned ones. */
+ return list_concat(partitioned_rels_list, other_rels_list);
}
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index fbad13ea94..10cc2b8314 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1282,7 +1282,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
* the children.
*/
tableOIDs =
- find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL);
+ find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL,
+ NULL);
/*
* Check that there's at least one descendant, else fail. This could
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 9fe9e022b0..529f244f7e 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -112,7 +112,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
List *children;
ListCell *lc;
- children = find_inheritance_children(reloid, NoLock);
+ children = find_inheritance_children(reloid, NoLock, NULL);
foreach(lc, children)
{
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 610cb499d2..64179ea3ef 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -516,7 +516,7 @@ OpenTableList(List *tables)
List *children;
children = find_all_inheritors(myrelid, ShareUpdateExclusiveLock,
- NULL);
+ NULL, NULL);
foreach(child, children)
{
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0f08245a67..4d686a6f71 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -299,10 +299,10 @@ static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel);
static void MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel);
static void StoreCatalogInheritance(Oid relationId, List *supers,
- bool child_is_partition);
+ bool child_is_partition, bool child_is_partitioned);
static void StoreCatalogInheritance1(Oid relationId, Oid parentOid,
int16 seqNumber, Relation inhRelation,
- bool child_is_partition);
+ bool child_is_partition, bool child_is_partitioned);
static int findAttrByName(const char *attributeName, List *schema);
static void AlterIndexNamespaces(Relation classRel, Relation rel,
Oid oldNspOid, Oid newNspOid, ObjectAddresses *objsMoved);
@@ -753,7 +753,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
typaddress);
/* Store inheritance information for new rel. */
- StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL);
+ StoreCatalogInheritance(relationId, inheritOids, stmt->partbound != NULL,
+ relkind == RELKIND_PARTITIONED_TABLE);
/*
* We must bump the command counter to make the newly-created relation
@@ -1238,7 +1239,8 @@ ExecuteTruncate(TruncateStmt *stmt)
ListCell *child;
List *children;
- children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL);
+ children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL,
+ NULL);
foreach(child, children)
{
@@ -2305,7 +2307,7 @@ MergeCheckConstraint(List *constraints, char *name, Node *expr)
*/
static void
StoreCatalogInheritance(Oid relationId, List *supers,
- bool child_is_partition)
+ bool child_is_partition, bool child_is_partitioned)
{
Relation relation;
int16 seqNumber;
@@ -2336,7 +2338,7 @@ StoreCatalogInheritance(Oid relationId, List *supers,
Oid parentOid = lfirst_oid(entry);
StoreCatalogInheritance1(relationId, parentOid, seqNumber, relation,
- child_is_partition);
+ child_is_partition, child_is_partitioned);
seqNumber++;
}
@@ -2350,7 +2352,7 @@ StoreCatalogInheritance(Oid relationId, List *supers,
static void
StoreCatalogInheritance1(Oid relationId, Oid parentOid,
int16 seqNumber, Relation inhRelation,
- bool child_is_partition)
+ bool child_is_partition, bool child_is_partitioned)
{
TupleDesc desc = RelationGetDescr(inhRelation);
Datum values[Natts_pg_inherits];
@@ -2365,6 +2367,8 @@ StoreCatalogInheritance1(Oid relationId, Oid parentOid,
values[Anum_pg_inherits_inhrelid - 1] = ObjectIdGetDatum(relationId);
values[Anum_pg_inherits_inhparent - 1] = ObjectIdGetDatum(parentOid);
values[Anum_pg_inherits_inhseqno - 1] = Int16GetDatum(seqNumber);
+ values[Anum_pg_inherits_inhpartitioned - 1] =
+ BoolGetDatum(child_is_partitioned);
memset(nulls, 0, sizeof(nulls));
@@ -2564,7 +2568,7 @@ renameatt_internal(Oid myrelid,
* outside the inheritance hierarchy being processed.
*/
child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
- &child_numparents);
+ &child_numparents, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -2591,7 +2595,7 @@ renameatt_internal(Oid myrelid,
* expected_parents will only be 0 if we are not already recursing.
*/
if (expected_parents == 0 &&
- find_inheritance_children(myrelid, NoLock) != NIL)
+ find_inheritance_children(myrelid, NoLock, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited column \"%s\" must be renamed in child tables too",
@@ -2774,7 +2778,7 @@ rename_constraint_internal(Oid myrelid,
*li;
child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
- &child_numparents);
+ &child_numparents, NULL);
forboth(lo, child_oids, li, child_numparents)
{
@@ -2790,7 +2794,7 @@ rename_constraint_internal(Oid myrelid,
else
{
if (expected_parents == 0 &&
- find_inheritance_children(myrelid, NoLock) != NIL)
+ find_inheritance_children(myrelid, NoLock, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited constraint \"%s\" must be renamed in child tables too",
@@ -4803,7 +4807,7 @@ ATSimpleRecursion(List **wqueue, Relation rel,
ListCell *child;
List *children;
- children = find_all_inheritors(relid, lockmode, NULL);
+ children = find_all_inheritors(relid, lockmode, NULL, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -5212,7 +5216,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
*/
if (colDef->identity &&
recurse &&
- find_inheritance_children(myrelid, NoLock) != NIL)
+ find_inheritance_children(myrelid, NoLock, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("cannot recursively add identity column to table that has child tables")));
@@ -5418,7 +5422,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
* routines, we have to do this one level of recursion at a time; we can't
* use find_all_inheritors to do it in one pass.
*/
- children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+ children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+ NULL);
/*
* If we are told not to recurse, there had better not be any child
@@ -6537,7 +6542,8 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,
* routines, we have to do this one level of recursion at a time; we can't
* use find_all_inheritors to do it in one pass.
*/
- children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+ children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+ NULL);
if (children)
{
@@ -6971,7 +6977,8 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
* routines, we have to do this one level of recursion at a time; we can't
* use find_all_inheritors to do it in one pass.
*/
- children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+ children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+ NULL);
/*
* Check if ONLY was specified with ALTER TABLE. If so, allow the
@@ -7692,7 +7699,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
*/
if (!recursing && !con->connoinherit)
children = find_all_inheritors(RelationGetRelid(rel),
- lockmode, NULL);
+ lockmode, NULL, NULL);
/*
* For CHECK constraints, we must ensure that we only mark the
@@ -8575,7 +8582,8 @@ ATExecDropConstraint(Relation rel, const char *constrName,
* use find_all_inheritors to do it in one pass.
*/
if (!is_no_inherit_constraint)
- children = find_inheritance_children(RelationGetRelid(rel), lockmode);
+ children = find_inheritance_children(RelationGetRelid(rel), lockmode,
+ NULL);
else
children = NIL;
@@ -8864,7 +8872,7 @@ ATPrepAlterColumnType(List **wqueue,
ListCell *child;
List *children;
- children = find_all_inheritors(relid, lockmode, NULL);
+ children = find_all_inheritors(relid, lockmode, NULL, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -8915,7 +8923,8 @@ ATPrepAlterColumnType(List **wqueue,
}
}
else if (!recursing &&
- find_inheritance_children(RelationGetRelid(rel), NoLock) != NIL)
+ find_inheritance_children(RelationGetRelid(rel),
+ NoLock, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("type of inherited column \"%s\" must be changed in child tables too",
@@ -11027,7 +11036,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
* We use weakest lock we can on child's children, namely AccessShareLock.
*/
children = find_all_inheritors(RelationGetRelid(child_rel),
- AccessShareLock, NULL);
+ AccessShareLock, NULL, NULL);
if (list_member_oid(children, RelationGetRelid(parent_rel)))
ereport(ERROR,
@@ -11136,6 +11145,8 @@ CreateInheritance(Relation child_rel, Relation parent_rel)
inhseqno + 1,
catalogRelation,
parent_rel->rd_rel->relkind ==
+ RELKIND_PARTITIONED_TABLE,
+ child_rel->rd_rel->relkind ==
RELKIND_PARTITIONED_TABLE);
/* Now we're done with pg_inherits */
@@ -13696,7 +13707,8 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
* weaker lock now and the stronger one only when needed.
*/
attachrel_children = find_all_inheritors(RelationGetRelid(attachrel),
- AccessExclusiveLock, NULL);
+ AccessExclusiveLock, NULL,
+ NULL);
if (list_member_oid(attachrel_children, RelationGetRelid(rel)))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_TABLE),
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index faa181207a..e2e5ffce42 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -430,7 +430,8 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
oldcontext = MemoryContextSwitchTo(vac_context);
if (include_parts)
oid_list = list_concat(oid_list,
- find_all_inheritors(relid, NoLock, NULL));
+ find_all_inheritors(relid, NoLock, NULL,
+ NULL));
else
oid_list = lappend_oid(oid_list, relid);
MemoryContextSwitchTo(oldcontext);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 23ed2c55b9..b63abba1e4 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3278,7 +3278,8 @@ ExecSetupPartitionTupleRouting(Relation rel,
* Get the information about the partition tree after locking all the
* partitions.
*/
- (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL);
+ (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL,
+ NULL);
pds = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts);
/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 2202ad9941..81865cad7d 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1425,7 +1425,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
lockmode = AccessShareLock;
/* Scan for all members of inheritance set, acquire needed locks */
- inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+ inhOIDs = find_all_inheritors(parentOID, lockmode, NULL, NULL);
/*
* Check that there's at least one descendant, else treat as no-child
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index 26bfab5db6..9f59c017e7 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -30,9 +30,20 @@
CATALOG(pg_inherits,2611) BKI_WITHOUT_OIDS
{
+ /* OID of the child table. */
Oid inhrelid;
+
+ /* OID of the parent table. */
Oid inhparent;
+
+ /*
+ * Sequence number (starting with 1) of this parent, if this child table
+ * has multiple parents.
+ */
int32 inhseqno;
+
+ /* true if the child is a partitioned table, false otherwise. */
+ bool inhpartitioned;
} FormData_pg_inherits;
/* ----------------
@@ -46,10 +57,11 @@ typedef FormData_pg_inherits *Form_pg_inherits;
* compiler constants for pg_inherits
* ----------------
*/
-#define Natts_pg_inherits 3
-#define Anum_pg_inherits_inhrelid 1
-#define Anum_pg_inherits_inhparent 2
-#define Anum_pg_inherits_inhseqno 3
+#define Natts_pg_inherits 4
+#define Anum_pg_inherits_inhrelid 1
+#define Anum_pg_inherits_inhparent 2
+#define Anum_pg_inherits_inhseqno 3
+#define Anum_pg_inherits_inhpartitioned 4
/* ----------------
* pg_inherits has no initial contents
diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h
index 7743388899..8f371acae7 100644
--- a/src/include/catalog/pg_inherits_fn.h
+++ b/src/include/catalog/pg_inherits_fn.h
@@ -17,9 +17,10 @@
#include "nodes/pg_list.h"
#include "storage/lock.h"
-extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode);
+extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+ int *num_partitioned_children);
extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
- List **parents);
+ List **parents, int *num_partitioned_children);
extern bool has_subclass(Oid relationId);
extern bool has_superclass(Oid relationId);
extern bool typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId);
--
2.11.0
0002-Allow-locking-only-partitioned-children-in-partition.patchtext/plain; charset=UTF-8; name=0002-Allow-locking-only-partitioned-children-in-partition.patchDownload
From b92ef9ec07841942ceedd173dd81e3286419202f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 10 Aug 2017 17:59:18 +0900
Subject: [PATCH 2/6] Allow locking only partitioned children in partition tree
find_inheritance_childrem will still return the OIDs of the
non-partitioned children, but does not lock them if the caller asks
it so.
None of the callers pass 'true' yet though.
---
contrib/sepgsql/dml.c | 3 ++-
src/backend/catalog/partition.c | 3 ++-
src/backend/catalog/pg_inherits.c | 20 ++++++++++++++++----
src/backend/commands/analyze.c | 4 ++--
src/backend/commands/lockcmds.c | 2 +-
src/backend/commands/publicationcmds.c | 2 +-
src/backend/commands/tablecmds.c | 34 +++++++++++++++++-----------------
src/backend/commands/vacuum.c | 4 ++--
src/backend/executor/execMain.c | 4 ++--
src/backend/optimizer/prep/prepunion.c | 2 +-
src/include/catalog/pg_inherits_fn.h | 2 ++
11 files changed, 48 insertions(+), 32 deletions(-)
diff --git a/contrib/sepgsql/dml.c b/contrib/sepgsql/dml.c
index 6fc279805c..91f338f8bf 100644
--- a/contrib/sepgsql/dml.c
+++ b/contrib/sepgsql/dml.c
@@ -333,7 +333,8 @@ sepgsql_dml_privileges(List *rangeTabls, bool abort_on_violation)
if (!rte->inh)
tableIds = list_make1_oid(rte->relid);
else
- tableIds = find_all_inheritors(rte->relid, NoLock, NULL, NULL);
+ tableIds = find_all_inheritors(rte->relid, NoLock, false,
+ NULL, NULL);
foreach(li, tableIds)
{
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index fe8e60de14..9645381fcb 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -178,7 +178,8 @@ RelationBuildPartitionDesc(Relation rel)
return;
/* Get partition oids from pg_inherits */
- inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, NULL);
+ inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock, false,
+ NULL);
/* Collect bound spec nodes in a list */
i = 0;
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index 5292ec8058..72420f65f1 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -74,13 +74,16 @@ inhchildinfo_cmp(const void *p1, const void *p2)
* Returns a list containing the OIDs of all relations which
* inherit *directly* from the relation with OID 'parentrelId'.
*
- * The specified lock type is acquired on each child relation (but not on the
- * given rel; caller should already have locked it). If lockmode is NoLock
- * then no locks are acquired, but caller must beware of race conditions
- * against possible DROPs of child relations.
+ * The specified lock type is acquired on each child relation, (but not on the
+ * given rel; caller should already have locked it), unless
+ * lock_only_partitioned_children is specified in which case only partitioned
+ * children are locked. If lockmode is NoLock then no locks are acquired, but
+ * caller must beware of race conditions against possible DROPs of child
+ * relations.
*/
List *
find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+ bool lock_only_partitioned_children,
int *num_partitioned_children)
{
List *list = NIL;
@@ -162,6 +165,13 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
{
inhrelid = inhchildren[i].relid;
+ /* If requested, skip locking non-partitioned children. */
+ if (lock_only_partitioned_children && i >= *num_partitioned_children)
+ {
+ list = lappend_oid(list, inhrelid);
+ continue;
+ }
+
if (lockmode != NoLock)
{
/* Get the lock to synchronize against concurrent drop */
@@ -212,6 +222,7 @@ find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
*/
List *
find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
+ bool lock_only_partitioned_children,
List **numparents, int *num_partitioned_children)
{
/* hash table for O(1) rel_oid -> rel_numparents cell lookup */
@@ -264,6 +275,7 @@ find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
/* Get the direct children of this rel */
currentchildren = find_inheritance_children(currentrel, lockmode,
+ lock_only_partitioned_children,
&cur_num_partitioned_children);
my_num_partitioned_children += cur_num_partitioned_children;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 10cc2b8314..4bd374632f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1282,8 +1282,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
* the children.
*/
tableOIDs =
- find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, NULL,
- NULL);
+ find_all_inheritors(RelationGetRelid(onerel), AccessShareLock, false,
+ NULL, NULL);
/*
* Check that there's at least one descendant, else fail. This could
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index 529f244f7e..771aa11b1c 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -112,7 +112,7 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait)
List *children;
ListCell *lc;
- children = find_inheritance_children(reloid, NoLock, NULL);
+ children = find_inheritance_children(reloid, NoLock, false, NULL);
foreach(lc, children)
{
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 64179ea3ef..4315028c66 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -516,7 +516,7 @@ OpenTableList(List *tables)
List *children;
children = find_all_inheritors(myrelid, ShareUpdateExclusiveLock,
- NULL, NULL);
+ false, NULL, NULL);
foreach(child, children)
{
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 4d686a6f71..ef3869854a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1239,8 +1239,8 @@ ExecuteTruncate(TruncateStmt *stmt)
ListCell *child;
List *children;
- children = find_all_inheritors(myrelid, AccessExclusiveLock, NULL,
- NULL);
+ children = find_all_inheritors(myrelid, AccessExclusiveLock, false,
+ NULL, NULL);
foreach(child, children)
{
@@ -2567,7 +2567,7 @@ renameatt_internal(Oid myrelid,
* calls to renameatt() can determine whether there are any parents
* outside the inheritance hierarchy being processed.
*/
- child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
+ child_oids = find_all_inheritors(myrelid, AccessExclusiveLock, false,
&child_numparents, NULL);
/*
@@ -2595,7 +2595,7 @@ renameatt_internal(Oid myrelid,
* expected_parents will only be 0 if we are not already recursing.
*/
if (expected_parents == 0 &&
- find_inheritance_children(myrelid, NoLock, NULL) != NIL)
+ find_inheritance_children(myrelid, NoLock, false, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited column \"%s\" must be renamed in child tables too",
@@ -2778,7 +2778,7 @@ rename_constraint_internal(Oid myrelid,
*li;
child_oids = find_all_inheritors(myrelid, AccessExclusiveLock,
- &child_numparents, NULL);
+ false, &child_numparents, NULL);
forboth(lo, child_oids, li, child_numparents)
{
@@ -2794,7 +2794,7 @@ rename_constraint_internal(Oid myrelid,
else
{
if (expected_parents == 0 &&
- find_inheritance_children(myrelid, NoLock, NULL) != NIL)
+ find_inheritance_children(myrelid, NoLock, false, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("inherited constraint \"%s\" must be renamed in child tables too",
@@ -4807,7 +4807,7 @@ ATSimpleRecursion(List **wqueue, Relation rel,
ListCell *child;
List *children;
- children = find_all_inheritors(relid, lockmode, NULL, NULL);
+ children = find_all_inheritors(relid, lockmode, false, NULL, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -5216,7 +5216,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
*/
if (colDef->identity &&
recurse &&
- find_inheritance_children(myrelid, NoLock, NULL) != NIL)
+ find_inheritance_children(myrelid, NoLock, false, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("cannot recursively add identity column to table that has child tables")));
@@ -5423,7 +5423,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
* use find_all_inheritors to do it in one pass.
*/
children = find_inheritance_children(RelationGetRelid(rel), lockmode,
- NULL);
+ false, NULL);
/*
* If we are told not to recurse, there had better not be any child
@@ -6543,7 +6543,7 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,
* use find_all_inheritors to do it in one pass.
*/
children = find_inheritance_children(RelationGetRelid(rel), lockmode,
- NULL);
+ false, NULL);
if (children)
{
@@ -6978,7 +6978,7 @@ ATAddCheckConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
* use find_all_inheritors to do it in one pass.
*/
children = find_inheritance_children(RelationGetRelid(rel), lockmode,
- NULL);
+ false, NULL);
/*
* Check if ONLY was specified with ALTER TABLE. If so, allow the
@@ -7699,7 +7699,7 @@ ATExecValidateConstraint(Relation rel, char *constrName, bool recurse,
*/
if (!recursing && !con->connoinherit)
children = find_all_inheritors(RelationGetRelid(rel),
- lockmode, NULL, NULL);
+ lockmode, false, NULL, NULL);
/*
* For CHECK constraints, we must ensure that we only mark the
@@ -8583,7 +8583,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
*/
if (!is_no_inherit_constraint)
children = find_inheritance_children(RelationGetRelid(rel), lockmode,
- NULL);
+ false, NULL);
else
children = NIL;
@@ -8872,7 +8872,7 @@ ATPrepAlterColumnType(List **wqueue,
ListCell *child;
List *children;
- children = find_all_inheritors(relid, lockmode, NULL, NULL);
+ children = find_all_inheritors(relid, lockmode, false, NULL, NULL);
/*
* find_all_inheritors does the recursive search of the inheritance
@@ -8924,7 +8924,7 @@ ATPrepAlterColumnType(List **wqueue,
}
else if (!recursing &&
find_inheritance_children(RelationGetRelid(rel),
- NoLock, NULL) != NIL)
+ NoLock, false, NULL) != NIL)
ereport(ERROR,
(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
errmsg("type of inherited column \"%s\" must be changed in child tables too",
@@ -11036,7 +11036,7 @@ ATExecAddInherit(Relation child_rel, RangeVar *parent, LOCKMODE lockmode)
* We use weakest lock we can on child's children, namely AccessShareLock.
*/
children = find_all_inheritors(RelationGetRelid(child_rel),
- AccessShareLock, NULL, NULL);
+ AccessShareLock, false, NULL, NULL);
if (list_member_oid(children, RelationGetRelid(parent_rel)))
ereport(ERROR,
@@ -13707,7 +13707,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
* weaker lock now and the stronger one only when needed.
*/
attachrel_children = find_all_inheritors(RelationGetRelid(attachrel),
- AccessExclusiveLock, NULL,
+ AccessExclusiveLock, false, NULL,
NULL);
if (list_member_oid(attachrel_children, RelationGetRelid(rel)))
ereport(ERROR,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e2e5ffce42..70cd5721f3 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -430,8 +430,8 @@ get_rel_oids(Oid relid, const RangeVar *vacrel)
oldcontext = MemoryContextSwitchTo(vac_context);
if (include_parts)
oid_list = list_concat(oid_list,
- find_all_inheritors(relid, NoLock, NULL,
- NULL));
+ find_all_inheritors(relid, NoLock, false,
+ NULL, NULL));
else
oid_list = lappend_oid(oid_list, relid);
MemoryContextSwitchTo(oldcontext);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b63abba1e4..9ee0e03a3c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -3278,8 +3278,8 @@ ExecSetupPartitionTupleRouting(Relation rel,
* Get the information about the partition tree after locking all the
* partitions.
*/
- (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL,
- NULL);
+ (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, false,
+ NULL, NULL);
pds = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts);
/*
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 81865cad7d..0d20ffa2f7 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1425,7 +1425,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
lockmode = AccessShareLock;
/* Scan for all members of inheritance set, acquire needed locks */
- inhOIDs = find_all_inheritors(parentOID, lockmode, NULL, NULL);
+ inhOIDs = find_all_inheritors(parentOID, lockmode, false, NULL, NULL);
/*
* Check that there's at least one descendant, else treat as no-child
diff --git a/src/include/catalog/pg_inherits_fn.h b/src/include/catalog/pg_inherits_fn.h
index 8f371acae7..e568d11e43 100644
--- a/src/include/catalog/pg_inherits_fn.h
+++ b/src/include/catalog/pg_inherits_fn.h
@@ -18,8 +18,10 @@
#include "storage/lock.h"
extern List *find_inheritance_children(Oid parentrelId, LOCKMODE lockmode,
+ bool lock_only_partitioned_children,
int *num_partitioned_children);
extern List *find_all_inheritors(Oid parentrelId, LOCKMODE lockmode,
+ bool lock_only_partitioned_children,
List **parents, int *num_partitioned_children);
extern bool has_subclass(Oid relationId);
extern bool has_superclass(Oid relationId);
--
2.11.0
0003-WIP-Defer-opening-and-locking-partitions-to-set_appe.patchtext/plain; charset=UTF-8; name=0003-WIP-Defer-opening-and-locking-partitions-to-set_appe.patchDownload
From 35d5316f88a295576fd2c43d84a8df33e3f48728 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 26 Jul 2017 14:42:47 +0900
Subject: [PATCH 3/6] WIP: Defer opening and locking partitions to
set_append_rel_size
This will still create RT entries for the child tables in
expand_inherited_rtentry(), though not AppendRelInfos, because
they require locking and opening the relation. Having all the
RT entries created in advance means that setup_simple_rel_arrays
knows the size of root->simple_rte_array and root->simpl_rel_array
to allocate.
expand_inherited_rtentry also allocates LeafPartitionInfos and
PartitionInfos for individual leaf partitions and partitioned child
table, resp. All LPOs and POs thus created are stuffed into
PartitionRootInfo that is created for the parent. PartitionRootInfo
was previously called PartitionedChildRelInfo.
When set_append_rel_size is called for the root parent, the whole
partition tree will be recursively processed, creating a
PartitionAppendInfo in each recursive step, which maps a given
parent table in the partition tree to its immediate partitions (also
only those satisfy the query's WHERE condition). Once we have
PartitionAppendInfos for all the parents in the tree, we resume
set_append_rel_size() processing, creating RelOptInfos and
AppendRelInfos for the root parent's children and recursively doing
the same for its partitioned children and so on.
---
src/backend/catalog/partition.c | 20 ++
src/backend/nodes/copyfuncs.c | 17 --
src/backend/nodes/equalfuncs.c | 12 -
src/backend/nodes/outfuncs.c | 59 ++++-
src/backend/optimizer/path/allpaths.c | 389 +++++++++++++++++++++++++++++++--
src/backend/optimizer/plan/planner.c | 115 +++++++++-
src/backend/optimizer/plan/setrefs.c | 26 +++
src/backend/optimizer/prep/prepunion.c | 300 +++++++++++++++----------
src/backend/optimizer/util/plancat.c | 37 ++++
src/backend/optimizer/util/relnode.c | 91 +++++++-
src/backend/utils/cache/lsyscache.c | 50 +++++
src/include/catalog/partition.h | 4 +
src/include/nodes/nodes.h | 5 +-
src/include/nodes/relation.h | 95 +++++++-
src/include/optimizer/plancat.h | 1 +
src/include/optimizer/prep.h | 3 +
src/include/utils/lsyscache.h | 2 +
src/test/regress/expected/insert.out | 4 +-
18 files changed, 1029 insertions(+), 201 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 9645381fcb..a193f02551 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1141,6 +1141,26 @@ RelationGetPartitionDispatchInfo(Relation rel,
return pd;
}
+/*
+ * get_partitions_for_keys
+ * Returns the list of indexes (from pd->indexes) of the partitions that
+ * will need to be scanned for the given scan keys.
+ *
+ * TODO: add the interface to pass the query scan keys and the logic to look
+ * up partitions using those keys.
+ */
+List *
+get_partitions_for_keys(PartitionDispatch pd)
+{
+ int i;
+ List *result = NIL;
+
+ for (i = 0; i < pd->partdesc->nparts; i++)
+ result = lappend_int(result, pd->indexes[i]);
+
+ return result;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f9ddf4ed76..4c888ec3dc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2251,20 +2251,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -4996,9 +4982,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 8d92c03633..fb248f31f3 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -905,15 +905,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3155,9 +3146,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9ee3e23761..2480fd6429 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2213,7 +2213,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
+ WRITE_NODE_FIELD(prinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2287,6 +2287,12 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_INT_FIELD(num_parted);
+ /* don't bother printing partition_infos */
+ WRITE_INT_FIELD(num_leaf_parts);
+ /* don't bother printing leaf_part_infos */
+ WRITE_NODE_FIELD(live_partition_painfos);
+ WRITE_UINT_FIELD(root_parent_relid);
}
static void
@@ -2512,12 +2518,44 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
+_outPartitionInfo(StringInfo str, const PartitionInfo *node)
{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
+ WRITE_NODE_TYPE("PARTITIONINFO");
+
+ WRITE_BOOL_FIELD(is_other_temp);
+ WRITE_UINT_FIELD(relid);
+ /* Don't bother writing out the PartitionDispatch object */
+}
+
+static void
+_outLeafPartitionInfo(StringInfo str, const LeafPartitionInfo *node)
+{
+ WRITE_NODE_TYPE("LEAFPARTITIONINFO");
+
+ WRITE_BOOL_FIELD(is_other_temp);
+ WRITE_OID_FIELD(reloid);
+ WRITE_UINT_FIELD(relid);
+}
+
+static void
+_outPartitionAppendInfo(StringInfo str, const PartitionAppendInfo *node)
+{
+ WRITE_NODE_TYPE("PARTITIONAPPENDINFO");
+
+ WRITE_UINT_FIELD(parent_relid);
+ WRITE_NODE_FIELD(live_partition_relids);
+}
+
+static void
+_outPartitionRootInfo(StringInfo str, const PartitionRootInfo *node)
+{
+ WRITE_NODE_TYPE("PARTITIONROOTINFO");
WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
+ WRITE_NODE_FIELD(partition_infos);
+ WRITE_NODE_FIELD(partitioned_relids);
+ WRITE_NODE_FIELD(leaf_part_infos);
+ WRITE_NODE_FIELD(orig_leaf_part_oids);
}
static void
@@ -4045,8 +4083,17 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
+ case T_PartitionInfo:
+ _outPartitionInfo(str, obj);
+ break;
+ case T_LeafPartitionInfo:
+ _outLeafPartitionInfo(str, obj);
+ break;
+ case T_PartitionAppendInfo:
+ _outPartitionAppendInfo(str, obj);
+ break;
+ case T_PartitionRootInfo:
+ _outPartitionRootInfo(str, obj);
break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 2d7e1d84d0..c5c50e3b9d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
@@ -43,6 +44,8 @@
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
+#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -845,6 +848,172 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_partitions_recurse
+ * Find partitions of the partitioned table described in partinfo,
+ * recursing for those partitions that are themselves partitioned tables
+ *
+ * rootrel is the root of the partition tree of which this table is a part.
+ * We create a PartitionAppendInfo for this partitioned table and append it to
+ * rootrel->live_partition_painfos.
+ *
+ * List of the leaf partitions of this table will be returned.
+ */
+static List *
+get_rel_partitions_recurse(RelOptInfo *rootrel,
+ PartitionInfo *partinfo,
+ PartitionInfo **all_partinfos,
+ LeafPartitionInfo **leaf_part_infos)
+{
+ PartitionAppendInfo *painfo;
+ List *indexes;
+ List *result = NIL,
+ *my_live_partitions = NIL;
+ ListCell *l;
+
+ /*
+ * Create a PartitionAppendInfo to map this table to the child tables
+ * that will be its Append children.
+ */
+ painfo = makeNode(PartitionAppendInfo);
+ painfo->parent_relid = partinfo->relid;
+
+ /* They will all be under the root table's Append node. */
+ rootrel->live_partition_painfos = lappend(rootrel->live_partition_painfos,
+ painfo);
+
+ /*
+ * TODO: collect the keys by looking at the clauses in
+ * rootrel->baserestrictinfo considering this table's partition keys.
+ */
+
+ /* Ask partition.c which partitions it thinks match the keys. */
+ indexes = get_partitions_for_keys(partinfo->pd);
+
+ /* Collect leaf partitions in the result list and recurse for others. */
+ foreach(l, indexes)
+ {
+ int index = lfirst_int(l);
+
+ if (index >= 0)
+ {
+ LeafPartitionInfo *lpinfo = leaf_part_infos[index];
+
+ if (!lpinfo->is_other_temp)
+ {
+ result = lappend_oid(result, lpinfo->reloid);
+ my_live_partitions = lappend_int(my_live_partitions,
+ lpinfo->relid);
+ }
+ }
+ else
+ {
+ PartitionInfo *recurse_partinfo = all_partinfos[-index];
+ List *my_leaf_partitions;
+
+ if (!recurse_partinfo->is_other_temp)
+ {
+ my_live_partitions = lappend_int(my_live_partitions,
+ recurse_partinfo->relid);
+ my_leaf_partitions = get_rel_partitions_recurse(rootrel,
+ recurse_partinfo,
+ all_partinfos,
+ leaf_part_infos);
+ result = list_concat(result, my_leaf_partitions);
+ }
+ }
+ }
+
+ painfo->live_partition_relids = my_live_partitions;
+
+ return result;
+}
+
+/*
+ * get_rel_partitions
+ * Recursively find partitions of rel
+ */
+static List *
+get_rel_partitions(RelOptInfo *rel)
+{
+ return get_rel_partitions_recurse(rel,
+ rel->partition_infos[0],
+ rel->partition_infos,
+ rel->leaf_part_infos);
+}
+
+/*
+ * find_rel_partitions
+ * Find and lock partitions of rel relevant to this query
+ *
+ * Note that we only ever need to lock the leaf partitions, because the
+ * partitioned tables in the partition tree have already been locked.
+ */
+static void
+find_partitions_for_query(PlannerInfo *root, RelOptInfo *rel)
+{
+ List *leaf_part_oids = NIL;
+ ListCell *l;
+ PlanRowMark *rc = NULL;
+ int lockmode;
+ int num_leaf_parts,
+ i;
+ Oid *leaf_part_oids_array;
+ PartitionRootInfo *prinfo = NULL;
+
+ /* Find partitions. */
+ Assert(rel->partition_infos != NULL);
+ leaf_part_oids = get_rel_partitions(rel);
+
+ /* Convert the list to an array and sort for binary searching later. */
+ num_leaf_parts = list_length(leaf_part_oids);
+ leaf_part_oids_array = (Oid *) palloc(num_leaf_parts * sizeof(Oid));
+ i = 0;
+ foreach(l, leaf_part_oids)
+ {
+ leaf_part_oids_array[i++] = lfirst_oid(l);
+ }
+ qsort(leaf_part_oids_array, num_leaf_parts, sizeof(Oid), oid_cmp);
+
+ /*
+ * Now lock partitions. Note that rel cannot be a result relation or we
+ * wouldn't be here (inheritance_planner is where result relations go).
+ */
+ rc = get_plan_rowmark(root->rowMarks, rel->relid);
+ if (rc && RowMarkRequiresRowShareLock(rc->markType))
+ lockmode = RowShareLock;
+ else
+ lockmode = AccessShareLock;
+
+ /*
+ * We lock leaf partitions in the order in which find_all_inheritors
+ * found them in expand_inherited_rtentry(). Find that list by locating
+ * the PartitionRootInfo for this table.
+ */
+ foreach(l, root->prinfo_list)
+ {
+ prinfo = lfirst(l);
+
+ if (rel->relid == prinfo->parent_relid)
+ break;
+ }
+ Assert(prinfo != NULL && rel->relid == prinfo->parent_relid);
+ foreach(l, prinfo->orig_leaf_part_oids)
+ {
+ Oid relid = lfirst_oid(l);
+ Oid *test;
+
+ /* Will this leaf partition be scanned? */
+ test = (Oid *) bsearch(&relid,
+ leaf_part_oids_array,
+ num_leaf_parts,
+ sizeof(Oid), oid_cmp);
+ /* Yep, so lock. */
+ if (test != NULL)
+ LockRelationOid(relid, lockmode);
+ }
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -866,6 +1035,158 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ List *rel_appinfos = NIL;
+
+ /*
+ * Collect a list child AppendRelInfo's, which in the non-partitioned
+ * case will be found in root->append_rel_list. In the partitioned
+ * table's case, we didn't build any AppendRelInfo's yet. We will
+ * do the same after figuring out which of the table's child tables
+ * (aka partitions) will need to be scanned for this query.
+ */
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach(l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ Index root_parent_relid;
+ List *live_partitions,
+ *parent_vars;
+ Relation parent;
+
+ /*
+ * If this is a partitioned table root, we will determine all the
+ * partitions in this partition tree that we need to scan for this
+ * query. Among those, partitions that have not yet been locked (viz.
+ * the leaf partitions), will be.
+ */
+ if (rel->partition_infos != NULL)
+ {
+ PartitionAppendInfo *painfo;
+
+ root_parent_relid = rti;
+
+ find_partitions_for_query(root, rel);
+ painfo = linitial(rel->live_partition_painfos);
+ Assert(rti == painfo->parent_relid);
+ live_partitions = painfo->live_partition_relids;
+
+ parent = rel->partition_infos[0]->pd->reldesc;
+ }
+ else
+ {
+ int i;
+ RelOptInfo *rootrel;
+
+ root_parent_relid = rel->root_parent_relid;
+ rootrel = root->simple_rel_array[root_parent_relid];
+
+ /*
+ * Just need to get hold of the PartitionAppendInfo via the root
+ * parent's RelOptInfo.
+ */
+ i = 0;
+ foreach(l, rootrel->live_partition_painfos)
+ {
+ PartitionAppendInfo *painfo = lfirst(l);
+
+ if (rti == painfo->parent_relid)
+ {
+ live_partitions = painfo->live_partition_relids;
+ break;
+ }
+
+ /* Skip to the index of this table's PartitionInfo. */
+ i++;
+ }
+
+ /*
+ * For non-root parttioned tables, we already have a relcache
+ * pointer that RelationGetPartitionDispatchInfo() acquired for
+ * us.
+ */
+ parent = rootrel->partition_infos[i]->pd->reldesc;
+ }
+
+ /*
+ * Create an AppendRelInfo and a RelOptInfo for every candidate
+ * partition.
+ */
+ parent_vars = build_rel_vars(parent, rti);
+ foreach(l, live_partitions)
+ {
+ Index childRTindex = lfirst_int(l);
+ RangeTblEntry *childrte = planner_rt_fetch(childRTindex, root);
+ Relation child;
+ AppendRelInfo *appinfo;
+ RelOptInfo *childrel;
+
+ child = heap_open(childrte->relid, NoLock); /* already locked! */
+ appinfo = makeNode(AppendRelInfo);
+ appinfo->parent_relid = rti;
+ appinfo->child_relid = childRTindex;
+ appinfo->parent_reltype = parent->rd_rel->reltype;
+ appinfo->child_reltype = child->rd_rel->reltype;
+ appinfo->translated_vars = map_partition_varattnos(parent_vars,
+ rti,
+ child, parent,
+ NULL);
+ ChangeVarNodes((Node *) appinfo->translated_vars,
+ rti, childRTindex, 0);
+ appinfo->parent_reloid = rte->relid;
+
+ /* For the main loop below that does per-child table processing. */
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+
+ /*
+ * While at it, also add the appinfo into root->append_rel_list,
+ * so that any place that obtains a parent's children by looking
+ * them up in that list are able to do so.
+ */
+ root->append_rel_list = lappend(root->append_rel_list, appinfo);
+
+ /*
+ * Translate the column permissions bitmaps to the child's attnums
+ * (we have to build the translated_vars list before we can do
+ * this). But if this is the parent table, leave copyObject's
+ * result alone.
+ *
+ * Note: we need to do this even though the executor won't run any
+ * permissions checks on the child RTE. The
+ * insertedCols/updatedCols bitmaps may be examined for
+ * trigger-firing purposes.
+ */
+ childrte->selectedCols = translate_col_privs(rte->selectedCols,
+ appinfo->translated_vars);
+ childrte->insertedCols = translate_col_privs(rte->insertedCols,
+ appinfo->translated_vars);
+ childrte->updatedCols = translate_col_privs(rte->updatedCols,
+ appinfo->translated_vars);
+
+ childrel = build_simple_rel(root, childRTindex, rel);
+ childrel->root_parent_relid = root_parent_relid;
+ Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+
+ /* Copy the data that create_lateral_join_info() created */
+ Assert(childrel->direct_lateral_relids == NULL);
+ childrel->direct_lateral_relids = rel->direct_lateral_relids;
+ Assert(childrel->lateral_relids == NULL);
+ childrel->lateral_relids = rel->lateral_relids;
+ Assert(childrel->lateral_referencers == NULL);
+ childrel->lateral_referencers = rel->lateral_referencers;
+
+ root->total_table_pages += childrel->pages;
+ heap_close(child, NoLock);
+ }
+ }
Assert(IS_SIMPLE_REL(rel));
@@ -889,7 +1210,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -902,10 +1223,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1211,24 +1528,61 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
int parentRTindex = rti;
List *live_childrels = NIL;
ListCell *l;
+ List *append_rel_children = NIL;
+
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach(l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ append_rel_children = lappend_int(append_rel_children,
+ appinfo->child_relid);
+ }
+ }
+ else
+ {
+ /* For a partitioned table, first find its PartitionAppendInfo */
+ if (rel->live_partition_painfos != NIL)
+ {
+ PartitionAppendInfo *painfo;
+
+ /* This is the root partitioned rel. */
+ painfo = linitial(rel->live_partition_painfos);
+ append_rel_children = painfo->live_partition_relids;
+ }
+ else
+ {
+ RelOptInfo *rootrel;
+
+ /* Non-root partitioned table. Get it from the root rel. */
+ rootrel = root->simple_rel_array[rel->root_parent_relid];
+ foreach(l, rootrel->live_partition_painfos)
+ {
+ PartitionAppendInfo *painfo = lfirst(l);
+
+ if (rti == painfo->parent_relid)
+ {
+ append_rel_children = painfo->live_partition_relids;
+ break;
+ }
+ }
+ }
+ }
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, append_rel_children)
{
- AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
- int childRTindex;
+ int childRTindex = lfirst_int(l);
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
- childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
childrel = root->simple_rel_array[childRTindex];
@@ -1289,7 +1643,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte;
rte = planner_rt_fetch(rel->relid, root);
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+
+ /*
+ * Note that get_partitioned_child_rels must be called only for root
+ * partitioned tables and only those have rel->live_partition_painfos
+ * set.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->live_partition_painfos != NIL)
{
partitioned_rels = get_partitioned_child_rels(root, rel->relid);
/* The root partitioned table is included as a child rel */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 966230256e..1a85c83c50 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
+ root->prinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -1056,6 +1056,102 @@ inheritance_planner(PlannerInfo *root)
Index rti;
RangeTblEntry *parent_rte;
List *partitioned_rels = NIL;
+ List *rel_appinfos = NIL;
+ ListCell *l;
+
+ parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
+ if (parent_rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach(l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ PartitionRootInfo *prinfo = NULL;
+ Relation parent;
+ List *parent_vars;
+
+ /* Find the PartitionRootInfo for this parent. */
+ foreach(l, root->prinfo_list)
+ {
+ prinfo = lfirst(l);
+
+ if (prinfo->parent_relid == parentRTindex)
+ break;
+ }
+ Assert(prinfo != NULL && prinfo->parent_relid == parentRTindex);
+
+ parent = heap_open(parent_rte->relid, NoLock);
+ parent_vars = build_rel_vars(parent, parentRTindex);
+ foreach(l, prinfo->leaf_part_infos)
+ {
+ LeafPartitionInfo *lpinfo = lfirst(l);
+ Index childRTindex = lpinfo->relid;
+ RangeTblEntry *childrte = planner_rt_fetch(childRTindex, root);
+ Relation child;
+ AppendRelInfo *appinfo;
+
+ if (childrte->relkind == RELKIND_PARTITIONED_TABLE)
+ continue;
+
+ /*
+ * We'll need RowExclusiveLock, because just like the parent, each
+ * child is a result relation.
+ */
+ child = heap_open(childrte->relid, RowExclusiveLock);
+ appinfo = makeNode(AppendRelInfo);
+ appinfo->parent_relid = parentRTindex;
+ appinfo->child_relid = childRTindex;
+ appinfo->parent_reltype = parent->rd_rel->reltype;
+ appinfo->child_reltype = child->rd_rel->reltype;
+ appinfo->translated_vars = map_partition_varattnos(parent_vars,
+ parentRTindex,
+ child, parent,
+ NULL);
+ ChangeVarNodes((Node *) appinfo->translated_vars,
+ parentRTindex, childRTindex, 0);
+ appinfo->parent_reloid = RelationGetRelid(parent);
+
+ /* For the main loop below that does per-child table planning. */
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+
+ /*
+ * While at it, also add the appinfo into root->append_rel_list,
+ * so that any places that obtain a parent's children by looking
+ * them up in that list are able to do so.
+ */
+ root->append_rel_list = lappend(root->append_rel_list, appinfo);
+
+ /*
+ * Translate the column permissions bitmaps to the child's attnums
+ * (we have to build the translated_vars list before we can do
+ * this). But if this is the parent table, leave copyObject's
+ * result alone.
+ *
+ * Note: we need to do this even though the executor won't run any
+ * permissions checks on the child RTE. The
+ * insertedCols/updatedCols bitmaps may be examined for
+ * trigger-firing purposes.
+ */
+ childrte->selectedCols =
+ translate_col_privs(parent_rte->selectedCols,
+ appinfo->translated_vars);
+ childrte->insertedCols =
+ translate_col_privs(parent_rte->insertedCols,
+ appinfo->translated_vars);
+ childrte->updatedCols =
+ translate_col_privs(parent_rte->updatedCols,
+ appinfo->translated_vars);
+ heap_close(child, NoLock);
+ }
+ heap_close(parent, NoLock);
+ }
Assert(parse->commandType != CMD_INSERT);
@@ -1121,14 +1217,13 @@ inheritance_planner(PlannerInfo *root)
* opposite in the case of non-partitioned inheritance parent as described
* below.
*/
- parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
nominalRelation = parentRTindex;
/*
* And now we can get on with generating a plan for each child table.
*/
- foreach(lc, root->append_rel_list)
+ foreach(lc, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
PlannerInfo *subroot;
@@ -1136,10 +1231,6 @@ inheritance_planner(PlannerInfo *root)
RelOptInfo *sub_final_rel;
Path *subpath;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/*
* We need a working copy of the PlannerInfo so that we can control
* propagation of information back to the main copy.
@@ -6076,7 +6167,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
* Returns a list of the RT indexes of the partitioned child relations
* with rti as the root parent RT index.
*
- * Note: Only call this function on RTEs known to be partitioned tables.
+ * Note: Only call this function on RTEs known to be a root partitioned table.
*/
List *
get_partitioned_child_rels(PlannerInfo *root, Index rti)
@@ -6084,13 +6175,13 @@ get_partitioned_child_rels(PlannerInfo *root, Index rti)
List *result = NIL;
ListCell *l;
- foreach(l, root->pcinfo_list)
+ foreach(l, root->prinfo_list)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ PartitionRootInfo *prinfo = lfirst(l);
- if (pc->parent_relid == rti)
+ if (prinfo->parent_relid == rti)
{
- result = pc->child_rels;
+ result = prinfo->partitioned_relids;
break;
}
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b0c9e94459..4666e446d7 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -15,7 +15,9 @@
*/
#include "postgres.h"
+#include "access/heapam.h"
#include "access/transam.h"
+#include "catalog/partition.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -204,6 +206,10 @@ static bool extract_query_dependencies_walker(Node *node,
* to process targetlist and qual expressions. We can assume that the Plan
* nodes were just built by the planner and are not multiply referenced, but
* it's not so safe to assume that for expression tree nodes.
+ *
+ * Finally, we close some relcache references lintering in root. They are
+ * those of the partitioned tables whose PartitionDispatch objects are
+ * referenced from within root->prinfo_list.
*/
Plan *
set_plan_references(PlannerInfo *root, Plan *plan)
@@ -238,6 +244,26 @@ set_plan_references(PlannerInfo *root, Plan *plan)
glob->finalrowmarks = lappend(glob->finalrowmarks, newrc);
}
+ /*
+ * Close relcache references in PartitionDispatch objects referenced in
+ * root.
+ */
+ foreach(lc, root->prinfo_list)
+ {
+ PartitionRootInfo *prinfo = lfirst(lc);
+ ListCell *lc1;
+
+ foreach(lc1, prinfo->partition_infos)
+ {
+ PartitionInfo *pinfo = lfirst(lc1);
+
+ if (pinfo->pd->reldesc)
+ heap_close(pinfo->pd->reldesc, NoLock);
+ /* Shouldn't try to close again. XXX - hack? */
+ pinfo->pd->reldesc = NULL;
+ }
+ }
+
/* Now fix the Plan tree */
return set_plan_refs(root, plan, rtoffset);
}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 0d20ffa2f7..01de2d778d 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,6 @@ static void make_inh_translation_list(Relation oldrelation,
Relation newrelation,
Index newvarno,
List **translated_vars);
-static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
- List *translated_vars);
static Node *adjust_appendrel_attrs_mutator(Node *node,
adjust_appendrel_attrs_context *context);
static Relids adjust_child_relids(Relids relids, int nappinfos,
@@ -1352,11 +1350,19 @@ expand_inherited_tables(PlannerInfo *root)
/*
* expand_inherited_rtentry
- * Check whether a rangetable entry represents an inheritance set.
- * If so, add entries for all the child tables to the query's
- * rangetable, and build AppendRelInfo nodes for all the child tables
- * and add them to root->append_rel_list. If not, clear the entry's
- * "inh" flag to prevent later code from looking for AppendRelInfos.
+ * Perform actions necessary for applying this query to an inheritance
+ * set if the rte represents one
+ *
+ * That includes adding entries for all the child tables to the query's
+ * rangetable. Also, if this query requires a PlanRowMark, generate the same
+ * for each child table and append them to the planner's global list
+ * (root->rowMarks). If the inheritance set is really a partitioned table,
+ * our work here is done. If not, we also create AppendRelInfo nodes for
+ * all the child tables and add them to root->append_rel_list.
+ *
+ * If it turns out that the rte is not (or no longer) an inheritance set,
+ * clear the entry's "inh" flag to prevent later code from looking for
+ * AppendRelInfos.
*
* Note that the original RTE is considered to represent the whole
* inheritance set. The first of the generated RTEs is an RTE for the same
@@ -1381,9 +1387,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
List *inhOIDs;
List *appinfos;
ListCell *l;
- bool has_child;
- PartitionedChildRelInfo *pcinfo;
List *partitioned_child_rels = NIL;
+ List *partition_infos = NIL;
+ List *leaf_part_infos = NIL;
+ List *orig_leaf_part_oids;
+ int num_partitioned_children,
+ i;
+ PartitionDispatch *pds;
+ PartitionInfo *pinfo;
/* Does RT entry allow inheritance? */
if (!rte->inh)
@@ -1408,6 +1419,10 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* relation named in the query. However, for each child relation we add
* to the query, we must obtain an appropriate lock, because this will be
* the first use of those relations in the parse/rewrite/plan pipeline.
+ * For a partitioned table, we defer locking non-partitioned child tables
+ * (aka leaf partitions) to when we actually know that they will be
+ * scanned for this query. We do that by passing 'true' for
+ * lock_only_partitioned_children.
*
* If the parent relation is the query's result relation, then we need
* RowExclusiveLock. Otherwise, if it's accessed FOR UPDATE/SHARE, we
@@ -1425,7 +1440,8 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
lockmode = AccessShareLock;
/* Scan for all members of inheritance set, acquire needed locks */
- inhOIDs = find_all_inheritors(parentOID, lockmode, false, NULL, NULL);
+ inhOIDs = find_all_inheritors(parentOID, lockmode, true, NULL,
+ &num_partitioned_children);
/*
* Check that there's at least one descendant, else treat as no-child
@@ -1460,28 +1476,43 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
List *leaf_part_oids;
- int num_parted,
- i;
- PartitionDispatch *pds;
+ int num_parted;
+ Relation rootrel;
+
+ /*
+ * Keep leaf partition OIDs around so that we can lock them in this
+ * order when we eventually do it.
+ */
+ orig_leaf_part_oids = list_copy_tail(inhOIDs,
+ num_partitioned_children + 1);
- /* Discard the original list. */
- list_free(inhOIDs);
+ /* Discard the original inhOIDs list. */
inhOIDs = NIL;
- /* Request partitioning information. */
- pds = RelationGetPartitionDispatchInfo(oldrelation, &num_parted,
+ /*
+ * Request partitioning information. We don't pass oldrelation,
+ * because we want to keep the relcache pointer in PartitionDispatch
+ * open until much later, but we'll be closing oldrelation before
+ * returning from this function.
+ */
+ rootrel = heap_open(rte->relid, NoLock);
+ pds = RelationGetPartitionDispatchInfo(rootrel, &num_parted,
&leaf_part_oids);
/*
- * First collect the partitioned child table OIDs, which includes the
- * root parent at the head.
+ * We make a PartitionInfo object for every partitioned table in the
+ * tree, including the root table. Note that we create the root
+ * table's PartitionInfo outside the loop, because inhOIDs will not
+ * contain its OID. Also add the original rti to
+ * partitioned_child_rels.
*/
- for (i = 0; i < num_parted; i++)
- {
+ pinfo = makeNode(PartitionInfo);
+ pinfo->relid = rti;
+ pinfo->pd = pds[0];
+ partition_infos = list_make1(pinfo);
+ partitioned_child_rels = list_make1_int(rti);
+ for (i = 1; i < num_parted; i++)
inhOIDs = lappend_oid(inhOIDs, RelationGetRelid(pds[i]->reldesc));
- if (pds[i]->reldesc != oldrelation)
- heap_close(pds[i]->reldesc, NoLock);
- }
/* Concatenate the leaf partition OIDs. */
inhOIDs = list_concat(inhOIDs, leaf_part_oids);
@@ -1489,20 +1520,16 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
appinfos = NIL;
- has_child = false;
+ i = 1;
foreach(l, inhOIDs)
{
Oid childOID = lfirst_oid(l);
Relation newrelation;
- RangeTblEntry *childrte;
- Index childRTindex;
+ RangeTblEntry *childrte = NULL;
+ Index childRTindex = 0;
AppendRelInfo *appinfo;
-
- /* Open rel if needed; we already have required locks */
- if (childOID != parentOID)
- newrelation = heap_open(childOID, NoLock);
- else
- newrelation = oldrelation;
+ bool is_other_temp;
+ char child_relkind = get_rel_relkind(childOID);
/*
* It is possible that the parent table has children that are temp
@@ -1510,11 +1537,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* (because of buffering issues), and the best thing to do seems to be
* to silently ignore them.
*/
- if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation))
- {
- heap_close(newrelation, lockmode);
- continue;
- }
+ is_other_temp = rel_is_other_temp(childOID);
/*
* Build an RTE for the child, and attach to query's rangetable list.
@@ -1528,64 +1551,22 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* The parent securityQuals will be propagated to children along with
* other base restriction clauses, so we don't need to do it here.
*/
- childrte = copyObject(rte);
- childrte->relid = childOID;
- childrte->relkind = newrelation->rd_rel->relkind;
- childrte->inh = false;
- childrte->requiredPerms = 0;
- childrte->securityQuals = NIL;
- parse->rtable = lappend(parse->rtable, childrte);
- childRTindex = list_length(parse->rtable);
-
- /*
- * Build an AppendRelInfo for this parent and child, unless the child
- * is a partitioned table.
- */
- if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
+ if (!is_other_temp)
{
- /* Remember if we saw a real child. */
- if (childOID != parentOID)
- has_child = true;
-
- appinfo = makeNode(AppendRelInfo);
- appinfo->parent_relid = rti;
- appinfo->child_relid = childRTindex;
- appinfo->parent_reltype = oldrelation->rd_rel->reltype;
- appinfo->child_reltype = newrelation->rd_rel->reltype;
- make_inh_translation_list(oldrelation, newrelation, childRTindex,
- &appinfo->translated_vars);
- appinfo->parent_reloid = parentOID;
- appinfos = lappend(appinfos, appinfo);
-
- /*
- * Translate the column permissions bitmaps to the child's attnums
- * (we have to build the translated_vars list before we can do
- * this). But if this is the parent table, leave copyObject's
- * result alone.
- *
- * Note: we need to do this even though the executor won't run any
- * permissions checks on the child RTE. The
- * insertedCols/updatedCols bitmaps may be examined for
- * trigger-firing purposes.
- */
- if (childOID != parentOID)
- {
- childrte->selectedCols = translate_col_privs(rte->selectedCols,
- appinfo->translated_vars);
- childrte->insertedCols = translate_col_privs(rte->insertedCols,
- appinfo->translated_vars);
- childrte->updatedCols = translate_col_privs(rte->updatedCols,
- appinfo->translated_vars);
- }
+ childrte = copyObject(rte);
+ childrte->relid = childOID;
+ childrte->relkind = get_rel_relkind(childOID);
+ childrte->inh = (childrte->relkind == RELKIND_PARTITIONED_TABLE);
+ childrte->requiredPerms = 0;
+ childrte->securityQuals = NIL;
+ parse->rtable = lappend(parse->rtable, childrte);
+ childRTindex = list_length(parse->rtable);
}
- else
- partitioned_child_rels = lappend_int(partitioned_child_rels,
- childRTindex);
/*
* Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.
*/
- if (oldrc)
+ if (!is_other_temp && oldrc)
{
PlanRowMark *newrc = makeNode(PlanRowMark);
@@ -1606,12 +1587,89 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
*/
newrc->isParent = (childrte->relkind == RELKIND_PARTITIONED_TABLE);
- /* Include child's rowmark type in parent's allMarkTypes */
- oldrc->allMarkTypes |= newrc->allMarkTypes;
root->rowMarks = lappend(root->rowMarks, newrc);
}
+ /*
+ * No need to create AppendRelInfo for partitions at this point. We
+ * will create one if and when we know we'll need it. The fact that
+ * this is a child table of the parent table will be recorded in the
+ * PartitionRootInfo that will be created for the parent table.
+ */
+ if (rel_is_partition(childOID) &&
+ child_relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ LeafPartitionInfo *lpinfo = makeNode(LeafPartitionInfo);
+
+ lpinfo->is_other_temp = is_other_temp;
+ lpinfo->reloid = childOID;
+ lpinfo->relid = childRTindex;
+ leaf_part_infos = lappend(leaf_part_infos, lpinfo);
+ continue;
+ }
+
+ /* Create the PartitionInfo of this child partitioned table. */
+ if (child_relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ PartitionInfo *pinfo = makeNode(PartitionInfo);
+
+ pinfo->is_other_temp = is_other_temp;
+ pinfo->relid = childRTindex;
+ pinfo->pd = pds[i++];
+ partition_infos = lappend(partition_infos, pinfo);
+
+ partitioned_child_rels = lappend_int(partitioned_child_rels,
+ childRTindex);
+ continue;
+ }
+
+ if (is_other_temp)
+ continue;
+
+ /*
+ * Getting here means this is a non-partitioned child table that is
+ * not a partition. Build an AppendRelInfo for the same to remember
+ * the parent-child relationship.
+ */
+
+ /* Open rel if needed, we already have required locks */
+ if (childOID != parentOID)
+ newrelation = heap_open(childOID, NoLock);
+ else
+ newrelation = oldrelation;
+
+ appinfo = makeNode(AppendRelInfo);
+ appinfo->parent_relid = rti;
+ appinfo->child_relid = childRTindex;
+ appinfo->parent_reltype = oldrelation->rd_rel->reltype;
+ appinfo->child_reltype = newrelation->rd_rel->reltype;
+ make_inh_translation_list(oldrelation, newrelation, childRTindex,
+ &appinfo->translated_vars);
+ appinfo->parent_reloid = parentOID;
+ appinfos = lappend(appinfos, appinfo);
+
+ /*
+ * Translate the column permissions bitmaps to the child's attnums
+ * (we have to build the translated_vars list before we can do
+ * this). But if this is the parent table, leave copyObject's
+ * result alone.
+ *
+ * Note: we need to do this even though the executor won't run any
+ * permissions checks on the child RTE. The
+ * insertedCols/updatedCols bitmaps may be examined for
+ * trigger-firing purposes.
+ */
+ if (childOID != parentOID)
+ {
+ childrte->selectedCols = translate_col_privs(rte->selectedCols,
+ appinfo->translated_vars);
+ childrte->insertedCols = translate_col_privs(rte->insertedCols,
+ appinfo->translated_vars);
+ childrte->updatedCols = translate_col_privs(rte->updatedCols,
+ appinfo->translated_vars);
+ }
+
/* Close child relations, but keep locks */
if (childOID != parentOID)
heap_close(newrelation, NoLock);
@@ -1620,35 +1678,49 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
heap_close(oldrelation, NoLock);
/*
- * If all the children were temp tables or a partitioned parent did not
- * have any leaf partitions, pretend it's a non-inheritance situation; we
- * don't need Append node in that case. The duplicate RTE we added for
- * the parent table is harmless, so we don't bother to get rid of it;
- * ditto for the useless PlanRowMark node.
+ * We keep a list of objects in root, each of which maps a partitioned
+ * parent RT index to a bunch of information about the partition tree
+ * rooted at that parent. The information includes a list of RT indexes
+ * of partitioned tables appearing in the tree, a list of PartitionInfo
+ * objects for each such partitioned table, a list of LeafPartitionInfo
+ * objects for each leaf partition in tree, and finally a list containing
+ * leaf partition OIDs in an order in which find_all_inheritors() returned
+ * them. The first of these is used when creating an Append or a
+ * ModifyTable path for the parent to be copied verbatim into the path
+ * (and subsequently the plan) so that it could be carried over to the
+ * executor. That list is the only place where the executor could find
+ * partitioned child tables to lock them.
*/
- if (!has_child)
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
- /* Clear flag before returning */
- rte->inh = false;
+ PartitionRootInfo *prinfo = makeNode(PartitionRootInfo);
+
+ Assert(list_length(partition_infos) >= 1);
+ prinfo->parent_relid = rti;
+ prinfo->partitioned_relids = partitioned_child_rels;
+ prinfo->partition_infos = partition_infos;
+ prinfo->leaf_part_infos = leaf_part_infos;
+ prinfo->orig_leaf_part_oids = orig_leaf_part_oids;
+
+ root->prinfo_list = lappend(root->prinfo_list, prinfo);
+
+ /*
+ * Our job here is done, because we didn't create any AppendRelInfos.
+ */
return;
}
/*
- * We keep a list of objects in root, each of which maps a partitioned
- * parent RT index to the list of RT indexes of its partitioned child
- * tables. When creating an Append or a ModifyTable path for the parent,
- * we copy the child RT index list verbatim to the path so that it could
- * be carried over to the executor so that the latter could identify the
- * partitioned child tables.
+ * If all the children were temp tables, pretend it's a non-inheritance
+ * situation; we don't need Append node in that case. The duplicate
+ * RTE we added for the parent table is harmless, so we don't bother to
+ * get rid of it; ditto for the useless PlanRowMark node.
*/
- if (partitioned_child_rels != NIL)
+ if (list_length(appinfos) < 2)
{
- pcinfo = makeNode(PartitionedChildRelInfo);
-
- Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
+ /* Clear flag before returning */
+ rte->inh = false;
+ return;
}
/* Otherwise, OK to add to root->append_rel_list */
@@ -1769,7 +1841,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
* query is really only going to reference the inherited columns. Instead
* we set the per-column bits for all inherited columns.
*/
-static Bitmapset *
+Bitmapset *
translate_col_privs(const Bitmapset *parent_privs,
List *translated_vars)
{
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index a1ebd4acc8..3781a91b76 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1577,6 +1577,43 @@ build_physical_tlist(PlannerInfo *root, RelOptInfo *rel)
}
/*
+ * build_rel_vars
+ *
+ * Returns a list containing Var expressions corresponding to a relation's
+ * attributes. Since the caller may already have the RangeTblEntry, we it
+ * pass the same instead of PlannerInfo to avoid finding it in the range
+ * table all over again.
+ */
+List *
+build_rel_vars(Relation relation, Index relid)
+{
+ AttrNumber attrno;
+ int numattrs;
+ List *result = NIL;
+
+ numattrs = RelationGetNumberOfAttributes(relation);
+ for (attrno = 1; attrno <= numattrs; attrno++)
+ {
+ Form_pg_attribute att_tup = TupleDescAttr(relation->rd_att,
+ attrno - 1);
+
+ if (att_tup->attisdropped)
+ continue;
+
+ result = lappend(result,
+ makeVar(relid,
+ attrno,
+ att_tup->atttypid,
+ att_tup->atttypmod,
+ att_tup->attcollation,
+ 0));
+
+ }
+
+ return result;
+}
+
+/*
* build_index_tlist
*
* Build a targetlist representing the columns of the specified index.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8ad0b4a669..1bcda9254f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,7 +16,9 @@
#include <limits.h>
+#include "catalog/pg_class.h"
#include "miscadmin.h"
+#include "nodes/relation.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -146,6 +148,15 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->baserestrict_min_security = UINT_MAX;
rel->joininfo = NIL;
rel->has_eclass_joins = false;
+ /* Set in build_simple_rel if rel is root partitioned table */
+ rel->num_parted = 0;
+ rel->partition_infos = NULL;
+ rel->num_leaf_parts = 0;
+ rel->leaf_part_infos = NULL;
+ /* Set in get_rel_partitions_recurse */
+ rel->live_partition_painfos = NIL;
+ /* Set in set_append_rel_size if rel is a partition. */
+ rel->root_parent_relid = 0;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -210,25 +221,83 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
list_length(rte->securityQuals));
/*
- * If this rel is an appendrel parent, recurse to build "other rel"
- * RelOptInfos for its children. They are "other rels" because they are
- * not in the main join tree, but we will need RelOptInfos to plan access
- * to them.
+ * If this rel is an appendrel parent, generate additional information
+ * based on whether the parent is a partitioned table or not. For
+ * regular parent tables, recurse to build "other rel" RelOptInfos for its
+ * children. They are "other rels" because they are not in the main join
+ * tree, but we will need RelOptInfos to plan access to them. For
+ * partitioned parent tables, we do not yet create "other rel" RelOptInfos
+ * for the children. Instead, we set up some informations that will be
+ * used in set_append_rel_size() to look up its partitions.
*/
if (rte->inh)
{
ListCell *l;
- foreach(l, root->append_rel_list)
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
- AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
+ PartitionRootInfo *prinfo = NULL;
+ LeafPartitionInfo **lpinfos;
+ int i;
+
+ foreach(l, root->prinfo_list)
+ {
+ if (((PartitionRootInfo *) lfirst(l))->parent_relid == relid)
+ {
+ prinfo = lfirst(l);
+ break;
+ }
+ }
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != relid)
- continue;
+ /*
+ * Only the root partitioned tables have an entry in
+ * root->prinfo_list. For other partitioned table rels, we don't
+ * need to set the following fields.
+ */
+ if (prinfo == NULL)
+ return rel;
+
+ Assert(prinfo->parent_relid == relid);
+ rel->num_parted = list_length(prinfo->partition_infos);
+ rel->num_leaf_parts = list_length(prinfo->leaf_part_infos);
+ rel->partition_infos = (PartitionInfo **)
+ palloc0(rel->num_parted *
+ sizeof(PartitionInfo *));
+ lpinfos = (LeafPartitionInfo **) palloc0(rel->num_leaf_parts *
+ sizeof(LeafPartitionInfo *));
+ i = 0;
+ foreach(l, prinfo->partition_infos)
+ {
+ rel->partition_infos[i++] = lfirst(l);
+ }
+ i = 0;
+ foreach(l, prinfo->leaf_part_infos)
+ {
+ lpinfos[i++] = lfirst(l);
+ }
+ rel->leaf_part_infos = lpinfos;
+
+ /*
+ * Don't build RelOptInfo for partitions yet; we don't know which
+ * ones we'll need. We did create RangeTblEntry's though, so we
+ * have an empty slot in root->simple_rel_array that will be
+ * filled eventually if the respective partition is chosen to be
+ * scanned after all.
+ */
+ }
+ else
+ {
+ foreach(l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid != relid)
+ continue;
- (void) build_simple_rel(root, appinfo->child_relid,
- rel);
+ (void) build_simple_rel(root, appinfo->child_relid,
+ rel);
+ }
}
}
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 82763f8013..ebbc3da985 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1817,6 +1817,28 @@ get_rel_relkind(Oid relid)
}
/*
+ * rel_is_partition
+ *
+ * Returns the relkind associated with a given relation.
+ */
+char
+rel_is_partition(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+ result = reltup->relispartition;
+ ReleaseSysCache(tp);
+
+ return result;
+}
+
+/*
* get_rel_tablespace
*
* Returns the pg_tablespace OID associated with a given relation.
@@ -1865,6 +1887,34 @@ get_rel_persistence(Oid relid)
return result;
}
+/*
+ * rel_is_other_temp
+ *
+ * Returns whether a relation is a temp table from another session
+ */
+bool
+rel_is_other_temp(Oid relid)
+{
+ HeapTuple tp;
+ Form_pg_class reltup;
+ bool result = false;
+
+ tp = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for relation %u", relid);
+ reltup = (Form_pg_class) GETSTRUCT(tp);
+
+ if (reltup->relpersistence == RELPERSISTENCE_TEMP &&
+ !isTempOrTempToastNamespace(reltup->relnamespace))
+ {
+ result = true;
+ }
+
+ ReleaseSysCache(tp);
+
+ return result;
+}
+
/* ---------- TRANSFORM CACHE ---------- */
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 1091dd572c..20fc3a89db 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -16,6 +16,7 @@
#include "fmgr.h"
#include "executor/tuptable.h"
#include "nodes/execnodes.h"
+#include "nodes/relation.h"
#include "parser/parse_node.h"
#include "utils/rel.h"
@@ -93,4 +94,7 @@ extern int get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos,
EState *estate,
PartitionTupleRoutingInfo **failed_at,
TupleTableSlot **failed_slot);
+
+/* Planner support stuff. */
+extern List *get_partitions_for_keys(PartitionDispatch pd);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 27bd4f3363..e957615ac6 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,10 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
+ T_PartitionInfo,
+ T_LeafPartitionInfo,
+ T_PartitionAppendInfo,
+ T_PartitionRootInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a39e59d8ac..a67a43b069 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -251,7 +251,7 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
+ List *prinfo_list; /* list of PartitionRootInfos */
List *rowMarks; /* list of PlanRowMarks */
@@ -515,6 +515,9 @@ typedef enum RelOptKind
/* Is the given relation an "other" relation? */
#define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
+typedef struct PartitionInfo PartitionInfo;
+typedef struct LeafPartitionInfo LeafPartitionInfo;
+
typedef struct RelOptInfo
{
NodeTag type;
@@ -592,6 +595,23 @@ typedef struct RelOptInfo
/* used by "other" relations */
Relids top_parent_relids; /* Relids of topmost parents */
+
+ /* Fields set for "root" partitioned relations */
+ int num_parted; /* Number of entries in partition_infos */
+ PartitionInfo **partition_infos;
+ int num_leaf_parts; /* Number of entries in leaf_part_infos */
+ LeafPartitionInfo **leaf_part_infos; /* LeafPartitionInfos */
+
+ /* Fields set for partitioned relations (list of PartitionAppendInfo's) */
+ List *live_partition_painfos;
+
+ /* Fields set for partition otherrels */
+
+ /*
+ * RT index of the root partitioned table in the the partition tree of
+ * which this rel is a member.
+ */
+ Index root_parent_relid;
} RelOptInfo;
/*
@@ -2012,24 +2032,75 @@ typedef struct AppendRelInfo
Oid parent_reloid; /* OID of parent relation */
} AppendRelInfo;
+/* Forward declarations, to avoid including other headers */
+typedef struct PartitionDispatchData *PartitionDispatch;
+
+/*
+ * PartitionInfo - information about partitioning of one partitioned table in
+ * a given partition tree
+ */
+typedef struct PartitionInfo
+{
+ NodeTag type;
+
+ bool is_other_temp; /* If true, ignore the following fields */
+ Index relid; /* Ordinal position in the rangetable */
+ PartitionDispatch pd; /* Information about partitions */
+} PartitionInfo;
+
+/*
+ * LeafPartitionInfo - (OID, RT index) pair for one leaf partition
+ *
+ * Created when a leaf partition's RT entry is created in
+ * expand_inherited_rtentry().
+ */
+typedef struct LeafPartitionInfo
+{
+ NodeTag type;
+
+ bool is_other_temp; /* If true, ignore the following fields. */
+ Oid reloid; /* OID */
+ Index relid; /* RT index */
+} LeafPartitionInfo;
+
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
+ * PartitionAppendInfo - list of child RT indexes for one partitioned table
+ * in a given partition tree
+ */
+typedef struct PartitionAppendInfo
+{
+ NodeTag type;
+
+ Index parent_relid;
+ List *live_partition_relids; /* List of RT indexes */
+} PartitionAppendInfo;
+
+/*
+ * For a partitioned table, this maps its RT index to the information about
+ * the partition tree collected in expand_inherited_rtentry().
+ *
+ * That information includes a list of PartitionInfo nodes, one for each
+ * partitioned table in the partition tree, including for the table itself.
+ * Also included is a list of RT indexes of the entries for leaf partitions
+ * that are created at the same time by expand_inherited_rtentry().
+ *
+ * orig_leaf_part_oids contains the list of leaf partition OIDs as it was
+ * generated by find_all_inheritors(). We keep it around so that we can
+ * lock leaf partitions in that order when we actually do it.
*
- * These structs are kept in the PlannerInfo node's pcinfo_list.
+ * PartitionRootInfo's for different partitioned tables in a query are placed
+ * in root->prinfo_list.
*/
-typedef struct PartitionedChildRelInfo
+typedef struct PartitionRootInfo
{
NodeTag type;
Index parent_relid;
- List *child_rels;
-} PartitionedChildRelInfo;
+ List *partition_infos;
+ List *partitioned_relids;
+ List *leaf_part_infos;
+ List *orig_leaf_part_oids;
+} PartitionRootInfo;
/*
* For each distinct placeholder expression generated during planning, we
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 71f0faf938..e8e30f8f52 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -39,6 +39,7 @@ extern bool relation_excluded_by_constraints(PlannerInfo *root,
RelOptInfo *rel, RangeTblEntry *rte);
extern List *build_physical_tlist(PlannerInfo *root, RelOptInfo *rel);
+extern List *build_rel_vars(Relation relation, Index relid);
extern bool has_unique_index(RelOptInfo *rel, AttrNumber attno);
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index 4be0afd566..d0af8dc7bc 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -16,6 +16,7 @@
#include "nodes/plannodes.h"
#include "nodes/relation.h"
+#include "utils/rel.h"
/*
@@ -51,6 +52,8 @@ extern PlanRowMark *get_plan_rowmark(List *rowmarks, Index rtindex);
extern RelOptInfo *plan_set_operations(PlannerInfo *root);
extern void expand_inherited_tables(PlannerInfo *root);
+extern Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
+ List *translated_vars);
extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
int nappinfos, AppendRelInfo **appinfos);
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 07208b56ce..b5b615a6fa 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -126,8 +126,10 @@ extern char *get_rel_name(Oid relid);
extern Oid get_rel_namespace(Oid relid);
extern Oid get_rel_type_id(Oid relid);
extern char get_rel_relkind(Oid relid);
+extern bool rel_is_partition(Oid relid);
extern Oid get_rel_tablespace(Oid relid);
extern char get_rel_persistence(Oid relid);
+extern bool rel_is_other_temp(Oid relid);
extern Oid get_transform_fromsql(Oid typid, Oid langid, List *trftypes);
extern Oid get_transform_tosql(Oid typid, Oid langid, List *trftypes);
extern bool get_typisdefined(Oid typid);
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index a2d9469592..e159d62b66 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -278,12 +278,12 @@ select tableoid::regclass, * from list_parted;
-------------+----+----
part_aa_bb | aA |
part_cc_dd | cC | 1
- part_null | | 0
- part_null | | 1
part_ee_ff1 | ff | 1
part_ee_ff1 | EE | 1
part_ee_ff2 | ff | 11
part_ee_ff2 | EE | 10
+ part_null | | 0
+ part_null | | 1
(8 rows)
-- some more tests to exercise tuple-routing with multi-level partitioning
--
2.11.0
0004-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patchtext/plain; charset=UTF-8; name=0004-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patchDownload
From f698fe0fe5a9454691304716711d08820e16e163 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/6] WIP: Interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 123 +++++++++++++++++++++++++++++-----------
1 file changed, 90 insertions(+), 33 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index a193f02551..ed85eafc32 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -105,6 +105,30 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the user-defined
+ * partition bound of a given existing partition, while an instance of the
+ * following struct describes either a new partition bound being compared
+ * against existing bounds (is_bound is true in that case and either lbound
+ * or rbound is set), or a new tuple's partition key specified in datums
+ * (ndatums = number of partition key columns).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -131,14 +155,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
/*
* RelationBuildPartitionDesc
@@ -684,10 +709,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -738,6 +769,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo && boundinfo->ndatums > 0 &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE);
@@ -757,8 +789,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -772,9 +807,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2019,9 +2054,14 @@ get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos,
{
/* Else bsearch in partdesc->boundinfo */
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key, partdesc->boundinfo,
- values, false, &equal);
+ &arg, &equal);
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
@@ -2219,12 +2259,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2246,11 +2286,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2258,17 +2298,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2279,12 +2337,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2298,20 +2357,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2324,8 +2382,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-WIP-partition.c-interface-additions-for-partition-pr.patchtext/plain; charset=UTF-8; name=0005-WIP-partition.c-interface-additions-for-partition-pr.patchDownload
From a7f5054cd9697e5f183ed5d0b674fa7235800bcb Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 5/6] WIP: partition.c interface additions for
partition-pruning
Add new arguments to get_partitions_for_keys() to allow the caller
to specify bounding scan keys, along with other information about
the scan keys extracted from the query, such as NULL-ness of the
keys, inclusive-ness, etc.
Query planner side (the caller) still doesn't pass anything for
the new arguments of get_partitions_for_keys. It will, once the
logic to extract the relevant scan keys will be implemented in
a subsequent commit.
...still using constraint exclusion...
---
src/backend/catalog/partition.c | 203 ++++++++++++++++++++++++++++++++--
src/backend/optimizer/path/allpaths.c | 10 +-
src/include/catalog/partition.h | 5 +-
3 files changed, 208 insertions(+), 10 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index ed85eafc32..afb85cbc37 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1181,17 +1181,204 @@ RelationGetPartitionDispatchInfo(Relation rel,
* Returns the list of indexes (from pd->indexes) of the partitions that
* will need to be scanned for the given scan keys.
*
- * TODO: add the interface to pass the query scan keys and the logic to look
- * up partitions using those keys.
+ * minkeys represents the lower bound on the partition key of the records that
+ * the query will return, while maxkeys represents the upper bound.
*/
List *
-get_partitions_for_keys(PartitionDispatch pd)
+get_partitions_for_keys(PartitionDispatch pd,
+ bool *key_is_null,
+ Datum *minkeys, int n_minkeys, bool min_inclusive,
+ Datum *maxkeys, int n_maxkeys, bool max_inclusive)
{
- int i;
+ int i,
+ minoff,
+ minidx = -1,
+ maxoff,
+ maxidx = -1;
List *result = NIL;
+ PartitionKey partkey = pd->key;
+ PartitionDesc partdesc = pd->partdesc;
+ PartitionBoundCmpArg arg;
+ bool is_equal;
- for (i = 0; i < pd->partdesc->nparts; i++)
- result = lappend_int(result, pd->indexes[i]);
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if partdesc->boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (key_is_null[i] && partdesc->boundinfo->null_index < 0)
+ {
+ minidx = INT_MAX;
+ maxidx = INT_MIN;
+ goto generate_partition_list;
+ }
+ else if (key_is_null[i])
+ {
+ minidx = maxidx = partdesc->boundinfo->null_index;
+ goto generate_partition_list;
+ }
+ }
+
+ if (n_minkeys > 0 && partdesc->nparts > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = minkeys;
+ arg.ndatums = n_minkeys;
+ minoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+ &arg, &is_equal);
+
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (min_inclusive)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 ||
+ minoff >= partdesc->boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, partdesc->boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ if (is_equal)
+ minidx = partdesc->boundinfo->indexes[minoff];
+ else
+ minidx = INT_MAX;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Records returned by the query will be > bounds[minoff],
+ * because min_scankey is >= bounds[minoff], that is, no
+ * records of the partition at minoff will be returned. Go
+ * to the next bound.
+ */
+ if (minoff < partdesc->boundinfo->ndatums - 1)
+ minoff += 1;
+
+ /*
+ * Make sure to skip a gap.
+ * Note: There are ndatums + 1 lots in the indexes array.
+ */
+ if (partdesc->boundinfo->indexes[minoff] < 0 &&
+ partdesc->boundinfo->indexes[minoff + 1] >= 0)
+ minoff += 1;
+
+ /*
+ * Make sure we return a valid partition's index. It's
+ * possible that no valid partition exists, that is, all
+ * partition bounds were <= min_scankey.
+ */
+ if (partdesc->boundinfo->indexes[minoff] < 0)
+ minidx = INT_MAX;
+ else
+ minidx = partdesc->boundinfo->indexes[minoff];
+ break;
+ }
+ }
+ else
+ minidx = 0;
+
+ if (n_maxkeys > 0 && partdesc->nparts > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = maxkeys;
+ arg.ndatums = n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+ &arg, &is_equal);
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (max_inclusive)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 ||
+ maxoff >= partdesc->boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, partdesc->boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (max_inclusive)
+ maxoff -= 1;
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ if (is_equal)
+ maxidx = partdesc->boundinfo->indexes[maxoff];
+ else
+ maxidx = INT_MIN;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Because bounds[maxoff] <= max_scankey, we may need to
+ * to consider the next partition as well, in addition to
+ * the partition at maxoff and earlier.
+ */
+ if (!is_equal || max_inclusive)
+ maxoff += 1;
+
+ /* Make sure to skip a gap. */
+ if (partdesc->boundinfo->indexes[maxoff] < 0 && maxoff >= 1)
+ maxoff -= 1;
+
+ /*
+ * Make sure we return a valid partition's index. It's
+ * possible that no valid partition exists, that is, all
+ * partition bounds were > max_scankey.
+ */
+ if (partdesc->boundinfo->indexes[maxoff] < 0)
+ maxidx = INT_MIN;
+ else
+ maxidx = partdesc->boundinfo->indexes[maxoff];
+
+ break;
+ }
+ }
+ else
+ maxidx = partdesc->nparts - 1;
+
+generate_partition_list:
+ for (i = minidx; i <= maxidx; i++)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Multiple values may belong to the same partition, so make
+ * sure we don't add the same partition index again.
+ */
+ result = list_append_unique_int(result, pd->indexes[i]);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = lappend_int(result, pd->indexes[i]);
+ break;
+ }
+ }
return result;
}
@@ -2253,8 +2440,8 @@ partition_rbound_cmp(PartitionKey key,
/*
* partition_rbound_datum_cmp
*
- * Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
- * is <, =, or > partition key of tuple (tuple_datums)
+ * Return whether range bound (specified in rb_datums, rb_kind) is <, =, or >
+ * partition key of tuple (tuple_datums)
*/
static int32
partition_rbound_datum_cmp(PartitionKey key,
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c5c50e3b9d..a5e217674b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -869,6 +869,9 @@ get_rel_partitions_recurse(RelOptInfo *rootrel,
List *result = NIL,
*my_live_partitions = NIL;
ListCell *l;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ Datum minkeys[PARTITION_MAX_KEYS],
+ maxkeys[PARTITION_MAX_KEYS];
/*
* Create a PartitionAppendInfo to map this table to the child tables
@@ -885,9 +888,14 @@ get_rel_partitions_recurse(RelOptInfo *rootrel,
* TODO: collect the keys by looking at the clauses in
* rootrel->baserestrictinfo considering this table's partition keys.
*/
+ memset(keyisnull, false, sizeof(keyisnull));
+ memset(minkeys, 0, sizeof(minkeys));
+ memset(maxkeys, 0, sizeof(maxkeys));
/* Ask partition.c which partitions it thinks match the keys. */
- indexes = get_partitions_for_keys(partinfo->pd);
+ indexes = get_partitions_for_keys(partinfo->pd, keyisnull,
+ minkeys, 0, false,
+ maxkeys, 0, false);
/* Collect leaf partitions in the result list and recurse for others. */
foreach(l, indexes)
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 20fc3a89db..fb15498f92 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -96,5 +96,8 @@ extern int get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos,
TupleTableSlot **failed_slot);
/* Planner support stuff. */
-extern List *get_partitions_for_keys(PartitionDispatch pd);
+extern List *get_partitions_for_keys(PartitionDispatch pd,
+ bool *key_is_null,
+ Datum *minkeys, int n_minkeys, bool min_inclusive,
+ Datum *maxkeys, int n_maxkeys, bool max_inclusive);
#endif /* PARTITION_H */
--
2.11.0
0006-WIP-planner-side-changes-for-partition-pruning.patchtext/plain; charset=UTF-8; name=0006-WIP-planner-side-changes-for-partition-pruning.patchDownload
From edd4e4c951166ce26eff88f1d9f8e9d9a2b19624 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 17:31:42 +0900
Subject: [PATCH 6/6] WIP: planner-side changes for partition-pruning
This implements the planner-side logic to extract bounding scan keys
to be passed to get_partitions_for_keys. That is it will go through
rel->baserestrictinfo and match individual clauses to partition keys
and construct lower bound and upper bound tuples, which may cover only
a prefix of a multi-column partition key.
A bunch of smarts are still missing when mapping the clause operands
with keys. For example, code to match a clause is specifed as
(constant op var) doesn't exist. Also, redundant keys are not
eliminated, for example, a combination of clauses a = 10 and a > 1
will cause the later clause a > 1 taking over and resulting in
needless scanning of partitions containing values a > 1 and a < 10.
...constraint exclusion is no longer used...
---
src/backend/catalog/partition.c | 57 +++++
src/backend/nodes/outfuncs.c | 2 +
src/backend/optimizer/path/allpaths.c | 157 ++++++++++++-
src/backend/optimizer/prep/prepunion.c | 9 +
src/backend/optimizer/util/plancat.c | 4 +
src/include/catalog/partition.h | 2 +
src/include/nodes/relation.h | 2 +
src/test/regress/expected/partition.out | 375 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 65 ++++++
11 files changed, 668 insertions(+), 8 deletions(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index afb85cbc37..8e57d36449 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1177,6 +1177,63 @@ RelationGetPartitionDispatchInfo(Relation rel,
}
/*
+ * get_partition_keys
+ * Returns a list of expressions matching the partition key columns
+ */
+List *
+get_partition_keys(PartitionDispatch pd, int varno)
+{
+ int i;
+ PartitionKey key = pd->key;
+ List *result = NIL;
+ ListCell *lc;
+
+ lc = list_head(key->partexprs);
+ for (i = 0; i < key->partnatts; i++)
+ {
+ Expr *keyCol;
+
+ if (key->partattrs[i] != 0)
+ {
+ keyCol = (Expr *) makeVar(varno,
+ key->partattrs[i],
+ key->parttypid[i],
+ key->parttypmod[i],
+ key->parttypcoll[i],
+ 0);
+ }
+ else
+ {
+ if (lc == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+ keyCol = (Expr *) copyObject(lfirst(lc));
+ lc = lnext(lc);
+ }
+
+ result = lappend(result, keyCol);
+ }
+
+ return result;
+}
+
+/*
+ * get_partition_opfamilies
+ * Get partitioning operator family OIDs for all keys
+ */
+List *
+get_partition_opfamilies(PartitionDispatch pd)
+{
+ List *result = NIL;
+ PartitionKey key = pd->key;
+ int i;
+
+ for (i = 0; i < key->partnatts; i++)
+ result = lappend_oid(result, key->partopfamily[i]);
+
+ return result;
+}
+
+/*
* get_partitions_for_keys
* Returns the list of indexes (from pd->indexes) of the partitions that
* will need to be scanned for the given scan keys.
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2480fd6429..4d092489ba 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2524,6 +2524,8 @@ _outPartitionInfo(StringInfo str, const PartitionInfo *node)
WRITE_BOOL_FIELD(is_other_temp);
WRITE_UINT_FIELD(relid);
+ WRITE_NODE_FIELD(keys);
+ WRITE_NODE_FIELD(keyopfamilies);
/* Don't bother writing out the PartitionDispatch object */
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a5e217674b..a55ede2faa 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -868,10 +868,24 @@ get_rel_partitions_recurse(RelOptInfo *rootrel,
List *indexes;
List *result = NIL,
*my_live_partitions = NIL;
- ListCell *l;
+ ListCell *lc1,
+ *lc2,
+ *keyopfamilies_item;
+ int keyPos;
+ List *matchedclauses[PARTITION_MAX_KEYS];
bool keyisnull[PARTITION_MAX_KEYS];
Datum minkeys[PARTITION_MAX_KEYS],
maxkeys[PARTITION_MAX_KEYS];
+ bool need_next_min,
+ need_next_max,
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_partkeys = list_length(partinfo->keys),
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ i;
/*
* Create a PartitionAppendInfo to map this table to the child tables
@@ -885,22 +899,151 @@ get_rel_partitions_recurse(RelOptInfo *rootrel,
painfo);
/*
- * TODO: collect the keys by looking at the clauses in
- * rootrel->baserestrictinfo considering this table's partition keys.
+ * Match individual OpExprs in the query's restriction with individual
+ * partition key columns. There is one list per key.
*/
+ memset(matchedclauses, 0, sizeof(matchedclauses));
memset(keyisnull, false, sizeof(keyisnull));
+ keyPos = 0;
+ foreach(lc1, partinfo->keys)
+ {
+ Node *partkey = lfirst(lc1);
+
+ foreach(lc2, rootrel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst(lc2);
+ Expr *clause = rinfo->clause;
+
+ if (is_opclause(clause))
+ {
+ Node *leftop = get_leftop(clause);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ matchedclauses[keyPos] = lappend(matchedclauses[keyPos],
+ clause);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey) && nulltest->nulltesttype == IS_NULL)
+ keyisnull[keyPos] = true;
+ }
+ }
+
+ /* Onto finding clauses matching the next partition key. */
+ keyPos++;
+ }
+
+ /*
+ * Determine the min keys and the max keys using btree semantics-based
+ * interpretation of the clauses' operators.
+ */
+
+ /*
+ * XXX - There should be a step similar to _bt_preprocess_keys() here,
+ * to eliminate any redundant scan keys for a given partition column. For
+ * example, among a <= 4 and a <= 5, we can only keep a <= 4 for being
+ * more restrictive and discard a <= 5. While doing that, we can also
+ * check to see if there exists a contradictory combination of scan keys
+ * that makes the query trivially false for all records in the table.
+ */
+
memset(minkeys, 0, sizeof(minkeys));
memset(maxkeys, 0, sizeof(maxkeys));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+ need_next_min = true;
+ need_next_max = true;
+ keyopfamilies_item = list_head(partinfo->keyopfamilies);
+ for (i = 0; i < n_partkeys; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc1, matchedclauses[i])
+ {
+ Expr *clause = lfirst(lc1);
+ Const *rightop = (Const *) get_rightop(clause);
+ Oid opno = ((OpExpr *) clause)->opno,
+ opfamily = lfirst_oid(keyopfamilies_item);
+ StrategyNumber strategy;
+
+ strategy = get_op_opfamily_strategy(opno, opfamily);
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+ }
+ if (strategy == BTLessStrategyNumber)
+ need_next_max = false;
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkeys[i] = rightop->constvalue;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+ }
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_min = false;
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkeys[i] = rightop->constvalue;
+ if (!minkey_set[i])
+ n_minkeys++;
+ }
+ minkey_set[i] = true;
+ min_incl = true;
+
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ }
+ maxkey_set[i] = true;
+ max_incl = true;
+ break;
+ }
+ }
+
+ keyopfamilies_item = lnext(keyopfamilies_item);
+ }
/* Ask partition.c which partitions it thinks match the keys. */
indexes = get_partitions_for_keys(partinfo->pd, keyisnull,
- minkeys, 0, false,
- maxkeys, 0, false);
+ minkeys, n_minkeys, min_incl,
+ maxkeys, n_maxkeys, max_incl);
/* Collect leaf partitions in the result list and recurse for others. */
- foreach(l, indexes)
+ foreach(lc1, indexes)
{
- int index = lfirst_int(l);
+ int index = lfirst_int(lc1);
if (index >= 0)
{
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 01de2d778d..e5c60020b7 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1509,6 +1509,8 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
pinfo = makeNode(PartitionInfo);
pinfo->relid = rti;
pinfo->pd = pds[0];
+ pinfo->keys = get_partition_keys(pinfo->pd, rti);
+ pinfo->keyopfamilies = get_partition_opfamilies(pinfo->pd);
partition_infos = list_make1(pinfo);
partitioned_child_rels = list_make1_int(rti);
for (i = 1; i < num_parted; i++)
@@ -1617,6 +1619,13 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
pinfo->is_other_temp = is_other_temp;
pinfo->relid = childRTindex;
pinfo->pd = pds[i++];
+
+ /* Convert so that expression contains oldrelation's attnos. */
+ pinfo->keys =
+ map_partition_varattnos(get_partition_keys(pinfo->pd, rti),
+ rti, oldrelation, pinfo->pd->reldesc,
+ NULL);
+ pinfo->keyopfamilies = get_partition_opfamilies(pinfo->pd);
partition_infos = lappend(partition_infos, pinfo);
partitioned_child_rels = lappend_int(partitioned_child_rels,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 3781a91b76..fdcb77b16f 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1149,7 +1149,9 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
List *pcqual;
+#endif
/*
* We assume the relation has already been safely locked.
@@ -1235,6 +1237,7 @@ get_relation_constraints(PlannerInfo *root,
}
}
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
/* Append partition predicates, if any */
pcqual = RelationGetPartitionQual(relation);
if (pcqual)
@@ -1252,6 +1255,7 @@ get_relation_constraints(PlannerInfo *root,
result = list_concat(result, pcqual);
}
+#endif
heap_close(relation, NoLock);
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index fb15498f92..d9ca8d8371 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -96,6 +96,8 @@ extern int get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos,
TupleTableSlot **failed_slot);
/* Planner support stuff. */
+extern List *get_partition_keys(PartitionDispatch pd, int varno);
+extern List *get_partition_opfamilies(PartitionDispatch pd);
extern List *get_partitions_for_keys(PartitionDispatch pd,
bool *key_is_null,
Datum *minkeys, int n_minkeys, bool min_inclusive,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a67a43b069..0f9bcd81ed 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -2045,6 +2045,8 @@ typedef struct PartitionInfo
bool is_other_temp; /* If true, ignore the following fields */
Index relid; /* Ordinal position in the rangetable */
+ List *keys; /* Expressions for partition keys */
+ List *keyopfamilies; /* Operator family OID per key */
PartitionDispatch pd; /* Information about partitions */
} PartitionInfo;
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..0839923fca
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,375 @@
+--
+-- Test partitioning planner code
+--
+create table rlpt (a int, b varchar) partition by range (a);
+create table rlpt1 partition of rlpt for values from (minvalue) to (1);
+create table rlpt2 partition of rlpt for values from (1) to (10);
+create table rlpt3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlpt3abcd partition of rlpt3 for values in ('ab', 'cd');
+create table rlpt3efgh partition of rlpt3 for values in ('ef', 'gh');
+create table rlpt3nullxy partition of rlpt3 for values in (null, 'xy');
+alter table rlpt attach partition rlpt3 for values from (15) to (20);
+create table rlpt4 partition of rlpt for values from (20) to (30);
+create table rlpt5 partition of rlpt for values from (31) to (maxvalue);
+explain (costs off) select * from rlpt where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlpt1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlpt where a <= 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlpt1
+ Filter: (a <= 1)
+ -> Seq Scan on rlpt2
+ Filter: (a <= 1)
+(5 rows)
+
+explain (costs off) select * from rlpt where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlpt2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlpt where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlpt2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlpt where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlpt1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlpt2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlpt3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlpt3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlpt3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlpt4
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlpt5
+ Filter: ((a)::numeric = '1'::numeric)
+(15 rows)
+
+explain (costs off) select * from rlpt where a <= 10;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on rlpt1
+ Filter: (a <= 10)
+ -> Seq Scan on rlpt2
+ Filter: (a <= 10)
+(5 rows)
+
+explain (costs off) select * from rlpt where a > 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlpt3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlpt3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlpt3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlpt4
+ Filter: (a > 10)
+ -> Seq Scan on rlpt5
+ Filter: (a > 10)
+(11 rows)
+
+explain (costs off) select * from rlpt where a < 15;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlpt1
+ Filter: (a < 15)
+ -> Seq Scan on rlpt2
+ Filter: (a < 15)
+(5 rows)
+
+explain (costs off) select * from rlpt where a <= 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlpt1
+ Filter: (a <= 15)
+ -> Seq Scan on rlpt2
+ Filter: (a <= 15)
+ -> Seq Scan on rlpt3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlpt3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlpt3nullxy
+ Filter: (a <= 15)
+(11 rows)
+
+explain (costs off) select * from rlpt where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlpt3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlpt4
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlpt5
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(7 rows)
+
+explain (costs off) select * from rlpt where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlpt3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlpt where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlpt3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlpt3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlpt3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(7 rows)
+
+explain (costs off) select * from rlpt where a is null; /* while we're on nulls */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlpt where a > 30;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlpt5
+ Filter: (a > 30)
+(3 rows)
+
+explain (costs off) select * from rlpt where a = 30; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlpt where a <= 31;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlpt1
+ Filter: (a <= 31)
+ -> Seq Scan on rlpt2
+ Filter: (a <= 31)
+ -> Seq Scan on rlpt3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlpt3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlpt3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlpt4
+ Filter: (a <= 31)
+ -> Seq Scan on rlpt5
+ Filter: (a <= 31)
+(15 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(9 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+(15 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+(5 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+(7 rows)
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p3
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p4
+ Filter: ((a > 1) AND (a = 10))
+(11 rows)
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlpt3 where b = 'ab' or b = 'ef';
+ QUERY PLAN
+------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlpt3abcd
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+ -> Seq Scan on rlpt3efgh
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+ -> Seq Scan on rlpt3nullxy
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+(7 rows)
+
+drop table rlpt, mc3p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index eefdeeacae..e5089a7cee 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 76b0de30a7..6611662149 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..8ffe91b08f
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,65 @@
+--
+-- Test partitioning planner code
+--
+create table rlpt (a int, b varchar) partition by range (a);
+create table rlpt1 partition of rlpt for values from (minvalue) to (1);
+create table rlpt2 partition of rlpt for values from (1) to (10);
+
+create table rlpt3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlpt3abcd partition of rlpt3 for values in ('ab', 'cd');
+create table rlpt3efgh partition of rlpt3 for values in ('ef', 'gh');
+create table rlpt3nullxy partition of rlpt3 for values in (null, 'xy');
+alter table rlpt attach partition rlpt3 for values from (15) to (20);
+
+create table rlpt4 partition of rlpt for values from (20) to (30);
+create table rlpt5 partition of rlpt for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlpt where a < 1;
+explain (costs off) select * from rlpt where a <= 1;
+explain (costs off) select * from rlpt where a = 1;
+explain (costs off) select * from rlpt where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlpt where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlpt where a <= 10;
+explain (costs off) select * from rlpt where a > 10;
+explain (costs off) select * from rlpt where a < 15;
+explain (costs off) select * from rlpt where a <= 15;
+explain (costs off) select * from rlpt where a > 15 and b = 'ab';
+explain (costs off) select * from rlpt where a = 16 and b is null;
+explain (costs off) select * from rlpt where a = 16 and b is not null;
+explain (costs off) select * from rlpt where a is null; /* while we're on nulls */
+explain (costs off) select * from rlpt where a > 30;
+explain (costs off) select * from rlpt where a = 30; /* empty */
+explain (costs off) select * from rlpt where a <= 31;
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlpt3 where b = 'ab' or b = 'ef';
+
+drop table rlpt, mc3p;
--
2.11.0
On Thu, Aug 31, 2017 at 2:02 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached is now also the set of patches that implement the actual
partition-pruning logic, viz. the last 3 patches (0004, 0005, and 0006) of
the attached.
It strikes me that this patch set is doing two things but maybe in the
opposite order that I would have chosen to attack them. First,
there's getting partition pruning to use something other than
constraint exclusion. Second, there's deferring work that is
currently done at an early stage of the process until later, so that
we waste less effort on partitions that are ultimately going to be
pruned.
The second one is certainly a worthwhile goal, but there are fairly
firm interdependencies between the first one and some other things
that are in progress. For example, the first one probably ought to be
done before hash partitioning gets committed, because
constraint-exclusion based partitioning pruning won't work with
partitioning pruning, but some mechanism based on asking the
partitioning code which partitions might match will. Such a mechanism
is more efficient for list and range partitions, but it's the only
thing that will work for hash partitions. Also, Beena Emerson is
working on run-time partition pruning, and the more I think about it,
the more I think that overlaps with this first part. Both patches
need a mechanism to identify, given a btree-indexable comparison
operator (< > <= >= =) and a set of values, which partitions might
contain matching values. Run-time partition pruning will call that at
execution time, and this patch will call it at plan time, but it's the
same logic; it's just a question of the point at which the values are
known. And of course we don't want to end up with two copies of the
logic.
Therefore, IMHO, it would be best to focus first on how we're going to
identify the partitions that survive pruning, and then afterwards work
on transposing that logic to happen before partitions are opened and
locked. That way, we get some incremental benefit sooner, and also
unblock some other development work.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the comments.
On 2017/09/02 2:52, Robert Haas wrote:
On Thu, Aug 31, 2017 at 2:02 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Attached is now also the set of patches that implement the actual
partition-pruning logic, viz. the last 3 patches (0004, 0005, and 0006) of
the attached.It strikes me that this patch set is doing two things but maybe in the
opposite order that I would have chosen to attack them. First,
there's getting partition pruning to use something other than
constraint exclusion. Second, there's deferring work that is
currently done at an early stage of the process until later, so that
we waste less effort on partitions that are ultimately going to be
pruned.
OK.
The second one is certainly a worthwhile goal, but there are fairly
firm interdependencies between the first one and some other things
that are in progress. For example, the first one probably ought to be
done before hash partitioning gets committed, because
constraint-exclusion based partitioning pruning won't work with
partitioning pruning, but some mechanism based on asking the
partitioning code which partitions might match will.
Yeah.
Such a mechanism
is more efficient for list and range partitions, but it's the only
thing that will work for hash partitions. Also, Beena Emerson is
working on run-time partition pruning, and the more I think about it,
the more I think that overlaps with this first part. Both patches
need a mechanism to identify, given a btree-indexable comparison
operator (< > <= >= =) and a set of values, which partitions might
contain matching values. Run-time partition pruning will call that at
execution time, and this patch will call it at plan time, but it's the
same logic; it's just a question of the point at which the values are
known. And of course we don't want to end up with two copies of the
logic.
Agreed here too.
I agree that spending effort on the first part (deferment of locking, etc.
within the planner) does not benefit either the hash partitioning and
run-time pruning patches much.
Therefore, IMHO, it would be best to focus first on how we're going to
identify the partitions that survive pruning, and then afterwards work
on transposing that logic to happen before partitions are opened and
locked. That way, we get some incremental benefit sooner, and also
unblock some other development work.
Alright, I will try to do it that way.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/04 10:10, Amit Langote wrote:
On 2017/09/02 2:52, Robert Haas wrote:
It strikes me that this patch set is doing two things but maybe in the
opposite order that I would have chosen to attack them. First,
there's getting partition pruning to use something other than
constraint exclusion. Second, there's deferring work that is
currently done at an early stage of the process until later, so that
we waste less effort on partitions that are ultimately going to be
pruned.OK.
Therefore, IMHO, it would be best to focus first on how we're going to
identify the partitions that survive pruning, and then afterwards work
on transposing that logic to happen before partitions are opened and
locked. That way, we get some incremental benefit sooner, and also
unblock some other development work.Alright, I will try to do it that way.
Attached set of patches that does things that way. Individual patches
described below:
[PATCH 1/5] Expand partitioned inheritance in a non-flattened manner
This will allow us to perform scan and join planning in a per partition
sub-tree manner, with each sub-tree's root getting its own RelOptInfo.
Previously, only the root of the whole partition tree would get a
RelOptInfo, along with the leaf partitions, with each leaf partition's
AppendRelInfo pointing to the former as its parent.
This is essential, because we will be doing partition-pruning for every
partitioned table in the tree by matching query's scan keys with its
partition key. We won't be able to do that if the intermediate
partitioned tables didn't have a RelOptInfo.
[PATCH 2/5] WIP: planner-side changes for partition-pruning
This patch adds a stub get_partitions_for_keys in partition.c with a
suitable interface for the caller to pass bounding keys extracted from the
query and other related information.
Importantly, it implements the logic within the planner to match query's
scan keys to a parent table's partition key and form the bounding keys
that will be passed to partition.c to compute the list of partitions that
satisfy those bounds.
Also, it adds certain fields to RelOptInfos of the partitioned tables that
reflect its partitioning properties.
[PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch}
This guy modifies the partition bound comparison function so that the
caller can pass incomplete partition key tuple that is potentially a
prefix of a multi-column partition key. Currently, the input tuple must
contain all of key->partnatts values, but that may not be true for
queries, which may not have restrictions on all the partition key columns.
[PATCH 4/5] WIP: Implement get_partitions_for_keys()
This one fills the get_partitions_for_keys stub with the actual logic that
searches the partdesc->boundinfo for partition bounds that match the
bounding keys specified by the caller.
[PATCH 5/5] Add more tests for the new partitioning-related planning
More tests.
Some TODO items still remain to be done:
* Process OR clauses to use for partition-pruning
* Process redundant clauses (such as a = 10 and a > 1) more smartly
* Other tricks that are missing
* Fix bugs
* More tests
Thanks,
Amit
Attachments:
0001-Expand-partitioned-inheritance-in-a-non-flattened-ma.patchtext/plain; charset=UTF-8; name=0001-Expand-partitioned-inheritance-in-a-non-flattened-ma.patchDownload
From 17dfaff62fe04cf18f5bba298974d42f92b597ef Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 6 Sep 2017 09:28:14 +0900
Subject: [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner
...except when the partitioned table in question is the result rel
of the query.
This allows us perform scan and join planning for each sub-tree in a
given partition tree, with each sub-tree's root partitioned table
getting its own RelOptInfo. Previously only the root of the whole
partition tree got a RelOptInfo, along with the leaf partitions,
with each leaf partition's AppendRelInfo pointing to the former as
its parent.
---
src/backend/optimizer/path/allpaths.c | 34 ++++++-
src/backend/optimizer/plan/planner.c | 3 +-
src/backend/optimizer/prep/prepunion.c | 166 +++++++++++++++++++++------------
src/backend/optimizer/util/plancat.c | 7 +-
4 files changed, 146 insertions(+), 64 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 2d7e1d84d0..6c3511bd47 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1289,11 +1289,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte;
rte = planner_rt_fetch(rel->relid, root);
+
+ /*
+ * Get the partitioned_rels list from root->pcinfo_list after
+ * confirming that rel is actually a root partitioned table.
+ */
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
- partitioned_rels = get_partitioned_child_rels(root, rel->relid);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+ int parent_relid;
+ bool is_root_partitioned_table = false;
+
+ /*
+ * Normally, only the root partitioned rel will be RELOPT_BASEREL
+ * in a given partitione tree, except when the root table itself
+ * is a child in the case of a UNION ALL query.
+ */
+ if (!IS_OTHER_REL(rel))
+ is_root_partitioned_table = true;
+ else if (bms_get_singleton_member(rel->top_parent_relids,
+ &parent_relid))
+ {
+ RelOptInfo *parent_rel;
+
+ parent_rel = root->simple_rel_array[parent_relid];
+ is_root_partitioned_table =
+ (parent_rel->rtekind != RTE_RELATION);
+ }
+
+ if (is_root_partitioned_table)
+ {
+ partitioned_rels = get_partitioned_child_rels(root, rel->relid);
+ /* The root partitioned table is included as a child rel */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
}
/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 966230256e..02662fad5d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6076,7 +6076,8 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
* Returns a list of the RT indexes of the partitioned child relations
* with rti as the root parent RT index.
*
- * Note: Only call this function on RTEs known to be partitioned tables.
+ * Note: Only call this function on RTEs known to be "root" partitioned
+ * tables.
*/
List *
get_partitioned_child_rels(PlannerInfo *root, Index rti)
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index ccf21453fd..433505948d 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -113,7 +113,8 @@ static void expand_single_inheritance_child(PlannerInfo *root,
Index parentRTindex, Relation parentrel,
PlanRowMark *parentrc, Relation childrel,
bool *has_child, List **appinfos,
- List **partitioned_child_rels);
+ List **partitioned_child_rels,
+ RangeTblEntry **newrte, Index *newRTindex);
static void make_inh_translation_list(Relation oldrelation,
Relation newrelation,
Index newvarno,
@@ -1380,8 +1381,8 @@ expand_inherited_tables(PlannerInfo *root)
* regular inheritance, a parent RTE must always have at least two associated
* AppendRelInfos: one corresponding to the parent table as a simple member of
* inheritance set and one or more corresponding to the actual children.
- * Since a partitioned table is not scanned, it might have only one associated
- * AppendRelInfo.
+ * Since a partitioned table parent is itself not scanned, it might have only
+ * one associated AppendRelInfo.
*/
static void
expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
@@ -1473,13 +1474,10 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
{
/*
* If this table has partitions, recursively expand them in the order
- * in which they appear in the PartitionDesc. But first, expand the
- * parent itself.
+ * in which they appear in the PartitionDesc. Also, start collecting
+ * the RT indexes of the partitioned tables in the partition tree.
*/
- expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc,
- oldrelation,
- &has_child, &appinfos,
- &partitioned_child_rels);
+ partitioned_child_rels = list_make1_int(rti);
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
RelationGetPartitionDesc(oldrelation),
lockmode,
@@ -1516,10 +1514,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
continue;
}
+ /*
+ * Don't expect to find any partitioned tables in a regular
+ * inheritance tree, so pass NULL for partitioned_child_rels here.
+ */
expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc,
newrelation,
&has_child, &appinfos,
- &partitioned_child_rels);
+ NULL, NULL, NULL);
/* Close child relations, but keep locks */
if (childOID != parentOID)
@@ -1581,6 +1583,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
{
Oid childOID = partdesc->oids[i];
Relation childrel;
+ RangeTblEntry *childrte;
+ Index childRTindex;
/* Open rel; we already have required locks */
childrel = heap_open(childOID, NoLock);
@@ -1592,19 +1596,60 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
continue;
}
- expand_single_inheritance_child(root, parentrte, parentRTindex,
+ expand_single_inheritance_child(root,
+ parentrte, parentRTindex,
parentrel, parentrc, childrel,
has_child, appinfos,
- partitioned_child_rels);
+ partitioned_child_rels,
+ &childrte, &childRTindex);
/* If this child is itself partitioned, recurse */
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- expand_partitioned_rtentry(root, parentrte, parentRTindex,
- parentrel, parentrc,
- RelationGetPartitionDesc(childrel),
- lockmode,
- has_child, appinfos,
- partitioned_child_rels);
+ {
+ RangeTblEntry *new_parentrte;
+ Index new_parentRTindex;
+ Relation new_parentrel;
+
+ /*
+ * For SELECT queries, it's desirable to perform scan and join
+ * planning on the individual partition sub-trees, instead of
+ * doing the same on the whole tree at once. This allows to apply
+ * techniques such as parition-pruning and/or partition-wise join
+ * on the individual partition sub-trees. For that to happen,
+ * root parent of each sub-tree must get an RTE with inh set to
+ * true, which must be already taken care of by
+ * expand_single_inheritance_child(). Next, for each of the
+ * children, we must record immediate parent as its parent in the
+ * the child AppendRelInfo, instead of the root parent of the
+ * whole tree.
+ *
+ * If parent is the query's result relation, inheritance_planner()
+ * will expand the inheritance so as to apply the *whole* query to
+ * each leaf partition, which means we cannot apply
+ * partition-pruning and/or partition-wise join to a partitioned
+ * result relation, meaning there is not much point in expanding
+ * the tree hierarchically.
+ */
+ if (parentRTindex == root->parse->resultRelation)
+ {
+ new_parentrte = parentrte;
+ new_parentRTindex = parentRTindex;
+ new_parentrel = parentrel;
+ }
+ else
+ {
+ new_parentrte = childrte;
+ new_parentRTindex = childRTindex;
+ new_parentrel = childrel;
+ }
+
+ expand_partitioned_rtentry(root, new_parentrte, new_parentRTindex,
+ new_parentrel, parentrc,
+ RelationGetPartitionDesc(childrel),
+ lockmode,
+ has_child, appinfos,
+ partitioned_child_rels);
+ }
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
@@ -1619,13 +1664,17 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* anything at all. Otherwise, we'll set "has_child" to true, build a
* RangeTblEntry and either a PartitionedChildRelInfo or AppendRelInfo as
* appropriate, plus maybe a PlanRowMark.
+ *
+ * The newly created RT entry and its RT index are returned in *newrte and
+ * *newRTindex, respectively.
*/
static void
expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *parentrc, Relation childrel,
bool *has_child, List **appinfos,
- List **partitioned_child_rels)
+ List **partitioned_child_rels,
+ RangeTblEntry **newrte, Index *newRTindex)
{
Query *parse = root->parse;
Oid parentOID = RelationGetRelid(parentrel);
@@ -1649,54 +1698,46 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
childrte = copyObject(parentrte);
childrte->relid = childOID;
childrte->relkind = childrel->rd_rel->relkind;
- childrte->inh = false;
+ childrte->inh = (childrte->relkind == RELKIND_PARTITIONED_TABLE);
childrte->requiredPerms = 0;
childrte->securityQuals = NIL;
parse->rtable = lappend(parse->rtable, childrte);
childRTindex = list_length(parse->rtable);
+ /* Build an AppendRelInfo for this parent and child. */
+
+ /* Remember if we saw a real child. */
+ if (childOID != parentOID)
+ *has_child = true;
+
+ appinfo = makeNode(AppendRelInfo);
+ appinfo->parent_relid = parentRTindex;
+ appinfo->child_relid = childRTindex;
+ appinfo->parent_reltype = parentrel->rd_rel->reltype;
+ appinfo->child_reltype = childrel->rd_rel->reltype;
+ make_inh_translation_list(parentrel, childrel, childRTindex,
+ &appinfo->translated_vars);
+ appinfo->parent_reloid = parentOID;
+ *appinfos = lappend(*appinfos, appinfo);
+
/*
- * Build an AppendRelInfo for this parent and child, unless the child is a
- * partitioned table.
+ * Translate the column permissions bitmaps to the child's attnums (we
+ * have to build the translated_vars list before we can do this). But
+ * if this is the parent table, leave copyObject's result alone.
+ *
+ * Note: we need to do this even though the executor won't run any
+ * permissions checks on the child RTE. The insertedCols/updatedCols
+ * bitmaps may be examined for trigger-firing purposes.
*/
- if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
+ if (childOID != parentOID)
{
- /* Remember if we saw a real child. */
- if (childOID != parentOID)
- *has_child = true;
-
- appinfo = makeNode(AppendRelInfo);
- appinfo->parent_relid = parentRTindex;
- appinfo->child_relid = childRTindex;
- appinfo->parent_reltype = parentrel->rd_rel->reltype;
- appinfo->child_reltype = childrel->rd_rel->reltype;
- make_inh_translation_list(parentrel, childrel, childRTindex,
- &appinfo->translated_vars);
- appinfo->parent_reloid = parentOID;
- *appinfos = lappend(*appinfos, appinfo);
-
- /*
- * Translate the column permissions bitmaps to the child's attnums (we
- * have to build the translated_vars list before we can do this). But
- * if this is the parent table, leave copyObject's result alone.
- *
- * Note: we need to do this even though the executor won't run any
- * permissions checks on the child RTE. The insertedCols/updatedCols
- * bitmaps may be examined for trigger-firing purposes.
- */
- if (childOID != parentOID)
- {
- childrte->selectedCols = translate_col_privs(parentrte->selectedCols,
- appinfo->translated_vars);
- childrte->insertedCols = translate_col_privs(parentrte->insertedCols,
- appinfo->translated_vars);
- childrte->updatedCols = translate_col_privs(parentrte->updatedCols,
- appinfo->translated_vars);
- }
+ childrte->selectedCols = translate_col_privs(parentrte->selectedCols,
+ appinfo->translated_vars);
+ childrte->insertedCols = translate_col_privs(parentrte->insertedCols,
+ appinfo->translated_vars);
+ childrte->updatedCols = translate_col_privs(parentrte->updatedCols,
+ appinfo->translated_vars);
}
- else
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
/*
* Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.
@@ -1726,6 +1767,15 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
root->rowMarks = lappend(root->rowMarks, childrc);
}
+
+ if (partitioned_child_rels &&
+ childrte->relkind == RELKIND_PARTITIONED_TABLE)
+ *partitioned_child_rels = lappend_int(*partitioned_child_rels,
+ childRTindex);
+ if (newrte)
+ *newrte = childrte;
+ if (newRTindex)
+ *newRTindex = childRTindex;
}
/*
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index a1ebd4acc8..bfc05a1af5 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1402,8 +1402,11 @@ relation_excluded_by_constraints(PlannerInfo *root,
if (predicate_refuted_by(safe_restrictions, safe_restrictions, false))
return true;
- /* Only plain relations have constraints */
- if (rte->rtekind != RTE_RELATION || rte->inh)
+ /*
+ * Only plain relations have constraints. In addition, there can be
+ * inheritance parent RTEs that are themselves partitions.
+ */
+ if (rte->rtekind != RTE_RELATION || (rte->inh && !IS_OTHER_REL(rel)))
return false;
/*
--
2.11.0
0002-WIP-planner-side-changes-for-partition-pruning.patchtext/plain; charset=UTF-8; name=0002-WIP-planner-side-changes-for-partition-pruning.patchDownload
From 0f1d944c23c0f9170ffe8553ef2d22754fa3aab7 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 17:31:42 +0900
Subject: [PATCH 2/5] WIP: planner-side changes for partition-pruning
Firstly, this adds a stub get_partitions_for_keys() in partition.c
with appropriate interface for the caller to specify bounding scan
keys, along with other information about the scan keys extracted
from the query, such as NULL-ness of the keys, inclusive-ness, etc.
More importantly, this implements the planner-side logic to extract
bounding scan keys to be passed to get_partitions_for_keys. That is,
it will go through rel->baserestrictinfo and match individual clauses
to partition keys and construct lower bound and upper bound tuples,
which may cover only a prefix of a multi-column partition key.
A bunch of smarts are still missing when mapping the clause operands
with keys. For example, code to match a clause is specifed as
(constant op var) doesn't exist. Also, redundant keys are not
eliminated, for example, a combination of clauses a = 10 and a > 1
will cause the later clause a > 1 taking over and resulting in
needless scanning of partitions containing values a > 1 and a < 10.
...constraint exclusion is still used, because
get_partitions_for_keys is just a stub...
---
src/backend/catalog/partition.c | 42 +++++
src/backend/optimizer/path/allpaths.c | 308 +++++++++++++++++++++++++++++-----
src/backend/optimizer/util/plancat.c | 120 +++++++++++++
src/backend/optimizer/util/relnode.c | 20 +++
src/include/catalog/partition.h | 8 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/relation.h | 135 +++++++++++++++
src/include/optimizer/plancat.h | 2 +
8 files changed, 595 insertions(+), 41 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 50162632f5..bb3009e5b3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1138,6 +1138,48 @@ RelationGetPartitionDispatchInfo(Relation rel,
return pd;
}
+/*
+ * get_partitions_for_keys
+ * Returns the list of indexes of rel's partitions that will need to be
+ * scanned given the bounding scan keys.
+ *
+ * Each value in the returned list can be used as an index into the oids array
+ * of the partition descriptor.
+ *
+ * Inputs:
+ * keynullness contains between 0 and (key->partnatts - 1) values, each
+ * telling what kind of NullTest has been applies to the corresponding
+ * partition key column. minkeys represents the lower bound on the partition
+ * the key of the records that the query will return, while maxkeys
+ * represents upper bound. min_inclusive and max_inclusive tell whether the
+ * bounds specified minkeys and maxkeys is inclusive, respectively.
+ *
+ * Other outputs:
+ * *min_datum_index will return the index in boundinfo->datums of the first
+ * datum that the query's bounding keys allow to be returned for the query.
+ * Similarly, *max_datum_index. *null_partition_chosen returns whether
+ * the null partition will be scanned.
+ *
+ * TODO: Implement.
+ */
+List *
+get_partitions_for_keys(Relation rel,
+ NullTestType *keynullness,
+ Datum *minkeys, int n_minkeys, bool min_inclusive,
+ Datum *maxkeys, int n_maxkeys, bool max_inclusive,
+ int *min_datum_index, int *max_datum_index,
+ bool *null_partition_chosen)
+{
+ List *result = NIL;
+ int i;
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+
+ for (i = 0; i < partdesc->nparts; i++)
+ result = lappend_int(result, i);
+
+ return result;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6c3511bd47..97af646242 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
@@ -845,6 +846,222 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the query clauses
+ *
+ * Returned list contains the AppendInfos of the chosen partitions.
+ */
+static List *
+get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *indexes;
+ List *result = NIL;
+ ListCell *lc1,
+ *lc2;
+ int keyPos;
+ List *matchedclauses[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ Datum minkeys[PARTITION_MAX_KEYS],
+ maxkeys[PARTITION_MAX_KEYS];
+ bool need_next_min,
+ need_next_max,
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_minkeys = 0,
+ n_maxkeys = 0,
+ i;
+
+ /*
+ * Match individual OpExprs in the query's restriction with individual
+ * partition key columns. There is one list per key.
+ */
+ memset(keynullness, -1, sizeof(keynullness));
+ memset(matchedclauses, 0, sizeof(matchedclauses));
+ keyPos = 0;
+ for (i = 0; i < rel->part_scheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+
+ foreach(lc2, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst(lc2);
+ Expr *clause = rinfo->clause;
+
+ if (is_opclause(clause))
+ {
+ Node *leftop = get_leftop(clause);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ matchedclauses[keyPos] = lappend(matchedclauses[keyPos],
+ clause);
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[keyPos] = IS_NOT_NULL;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ keynullness[keyPos] = nulltest->nulltesttype;
+ }
+ }
+
+ /* Onto finding clauses matching the next partition key. */
+ keyPos++;
+ }
+
+ /*
+ * Determine the min keys and the max keys using btree semantics-based
+ * interpretation of the clauses' operators.
+ */
+
+ /*
+ * XXX - There should be a step similar to _bt_preprocess_keys() here,
+ * to eliminate any redundant scan keys for a given partition column. For
+ * example, among a <= 4 and a <= 5, we can only keep a <= 4 for being
+ * more restrictive and discard a <= 5. While doing that, we can also
+ * check to see if there exists a contradictory combination of scan keys
+ * that makes the query trivially false for all records in the table.
+ */
+ memset(minkeys, 0, sizeof(minkeys));
+ memset(maxkeys, 0, sizeof(maxkeys));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < rel->part_scheme->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc1, matchedclauses[i])
+ {
+ Expr *clause = lfirst(lc1);
+ Const *rightop = (Const *) get_rightop(clause);
+ Oid opno = ((OpExpr *) clause)->opno,
+ opfamily = rel->part_scheme->partopfamily[i];
+ StrategyNumber strategy;
+
+ strategy = get_op_opfamily_strategy(opno, opfamily);
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+ }
+ if (strategy == BTLessStrategyNumber)
+ need_next_max = false;
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkeys[i] = rightop->constvalue;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+ }
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_min = false;
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkeys[i] = rightop->constvalue;
+ if (!minkey_set[i])
+ n_minkeys++;
+ }
+ minkey_set[i] = true;
+ min_incl = true;
+
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ }
+ maxkey_set[i] = true;
+ max_incl = true;
+ break;
+
+ /*
+ * This might mean '<>', but we don't have anything for that
+ * case yet. Perhaps, handle that as key < const OR
+ * key > const, once we have props needed for handling OR
+ * clauses.
+ */
+ default:
+ min_incl = max_incl = false;
+ break;
+ }
+ }
+ }
+
+ /* Ask partition.c which partitions it thinks match the keys. */
+ indexes = get_partitions_for_keys(parent, keynullness,
+ minkeys, n_minkeys, min_incl,
+ maxkeys, n_maxkeys, max_incl,
+ &rel->painfo->min_datum_idx,
+ &rel->painfo->max_datum_idx,
+ &rel->painfo->contains_null_partition);
+
+ if (indexes != NIL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ int first_index,
+ last_index;
+ first_index = linitial_int(indexes);
+ last_index = llast_int(indexes);
+ Assert(first_index <= last_index ||
+ rel->part_scheme->strategy != PARTITION_STRATEGY_RANGE);
+#endif
+
+ foreach(lc1, indexes)
+ {
+ int partidx = lfirst_int(lc1);
+ AppendRelInfo *appinfo = rel->child_appinfos[partidx];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+ Assert(partdesc->oids[partidx] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ }
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->painfo->live_partition_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -859,6 +1076,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -869,6 +1087,24 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_rel_partitions(root, rel, rte);
+ Assert(rel->painfo != NULL);
+ rel->painfo->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -889,7 +1125,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -902,10 +1138,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1114,6 +1346,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->painfo && rel->painfo)
+ {
+ rel->painfo->live_partitioned_rels =
+ list_concat(rel->painfo->live_partitioned_rels,
+ list_copy(childrel->painfo->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1209,14 +1452,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->painfo->live_partition_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1289,40 +1547,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte;
rte = planner_rt_fetch(rel->relid, root);
-
- /*
- * Get the partitioned_rels list from root->pcinfo_list after
- * confirming that rel is actually a root partitioned table.
- */
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- {
- int parent_relid;
- bool is_root_partitioned_table = false;
-
- /*
- * Normally, only the root partitioned rel will be RELOPT_BASEREL
- * in a given partitione tree, except when the root table itself
- * is a child in the case of a UNION ALL query.
- */
- if (!IS_OTHER_REL(rel))
- is_root_partitioned_table = true;
- else if (bms_get_singleton_member(rel->top_parent_relids,
- &parent_relid))
- {
- RelOptInfo *parent_rel;
-
- parent_rel = root->simple_rel_array[parent_relid];
- is_root_partitioned_table =
- (parent_rel->rtekind != RTE_RELATION);
- }
-
- if (is_root_partitioned_table)
- {
- partitioned_rels = get_partitioned_child_rels(root, rel->relid);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
- }
- }
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE && IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->painfo->live_partitioned_rels;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index bfc05a1af5..de50b5d86a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -68,6 +68,8 @@ static List *get_relation_constraints(PlannerInfo *root,
static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
Relation heapRelation);
static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
+static void get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+ Relation relation);
/*
* get_relation_info -
@@ -420,6 +422,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
/* Collect info about relation's foreign keys, if relevant */
get_relation_foreign_keys(root, rel, relation, inhparent);
+ /* Collect partitioning info, if relevant. */
+ if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ get_relation_partition_info(root, rel, relation);
+
heap_close(relation, NoLock);
/*
@@ -1805,3 +1811,117 @@ has_row_triggers(PlannerInfo *root, Index rti, CmdType event)
heap_close(relation, NoLock);
return result;
}
+
+static void
+get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+ Relation relation)
+{
+ int i;
+ ListCell *l;
+ PartitionKey key = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ rel->part_scheme = find_partition_scheme(root, relation);
+ rel->partexprs = (List **) palloc0(key->partnatts * sizeof(List *));
+
+ l = list_head(key->partexprs);
+ for (i = 0; i < key->partnatts; i++)
+ {
+ Expr *keyCol;
+
+ if (key->partattrs[i] != 0)
+ {
+ keyCol = (Expr *) makeVar(rel->relid,
+ key->partattrs[i],
+ key->parttypid[i],
+ key->parttypmod[i],
+ key->parttypcoll[i],
+ 0);
+ }
+ else
+ {
+ if (l == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+ keyCol = (Expr *) copyObject(lfirst(l));
+ l = lnext(l);
+ }
+
+ rel->partexprs[i] = list_make1(keyCol);
+ }
+
+ /* Values are filled in build_simple_rel(). */
+ rel->child_appinfos = (AppendRelInfo **) palloc0(partdesc->nparts *
+ sizeof(AppendRelInfo *));
+
+ /*
+ * A PartitionAppendInfo to map this table to its immediate partitions
+ * that will be scanned by this query. At the same time, it records the
+ * table's partitioning properties reflecting any partition-pruning that
+ * might occur to satisfy the query. Rest of the fields are set in
+ * get_rel_partitions() and set_append_rel_size().
+ */
+ rel->painfo = makeNode(PartitionAppendInfo);
+ rel->painfo->boundinfo = partdesc->boundinfo;
+}
+
+/*
+ * find_partition_scheme
+ *
+ * The function returns a canonical partition scheme which exactly matches the
+ * partitioning scheme of the given relation if one exists in the list of
+ * canonical partitioning schemes maintained in PlannerInfo. If none of the
+ * existing partitioning schemes match, the function creates a canonical
+ * partition scheme and adds it to the list.
+ *
+ * For an unpartitioned table or for a multi-level partitioned table it returns
+ * NULL. See comments in the function for more details.
+ */
+PartitionScheme
+find_partition_scheme(PlannerInfo *root, Relation relation)
+{
+ ListCell *lc;
+ PartitionKey key = RelationGetPartitionKey(relation);
+ char strategy = key->strategy;
+ int partnatts = key->partnatts;
+ PartitionScheme part_scheme = NULL;
+
+ /* Search for a matching partition scheme and return if found one. */
+ foreach(lc, root->partition_schemes)
+ {
+ part_scheme = lfirst(lc);
+
+ /* Match various partitioning attributes. */
+ if (strategy != part_scheme->strategy ||
+ partnatts != part_scheme->partnatts ||
+ memcmp(key->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(key->parttypmod, part_scheme->parttypmod,
+ sizeof(int32) * partnatts) != 0 ||
+ memcmp(key->partcollation, part_scheme->partcollation,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(key->partopfamily, part_scheme->partopfamily,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(key->partopcintype, part_scheme->partopcintype,
+ sizeof(Oid) * partnatts) != 0)
+ continue;
+
+ /* Found a matching partition scheme. */
+ return part_scheme;
+ }
+
+ /* Did not find matching partition scheme. Create one. */
+ part_scheme = (PartitionScheme) palloc0(sizeof(PartitionSchemeData));
+
+ part_scheme->strategy = strategy;
+ part_scheme->partnatts = partnatts;
+ part_scheme->parttypid = key->parttypid;
+ part_scheme->parttypmod = key->parttypmod;
+ part_scheme->partcollation = key->partcollation;
+ part_scheme->partopfamily = key->partopfamily;
+ part_scheme->partopcintype = key->partopcintype;
+
+ /* Add the partitioning scheme to PlannerInfo. */
+ root->partition_schemes = lappend(root->partition_schemes, part_scheme);
+
+ return part_scheme;
+}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8ad0b4a669..390d3b4956 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -17,6 +17,7 @@
#include <limits.h>
#include "miscadmin.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -163,6 +164,11 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
else
rel->top_parent_relids = NULL;
+ rel->child_appinfos = NULL;
+ rel->part_scheme = NULL;
+ rel->partexprs = NULL;
+ rel->painfo = NULL;
+
/* Check type of rtable entry */
switch (rte->rtekind)
{
@@ -218,7 +224,18 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
if (rte->inh)
{
ListCell *l;
+ AppendRelInfo **child_appinfos = NULL;
+ int i;
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->child_appinfos != NULL);
+ Assert(rel->painfo != NULL);
+ child_appinfos = rel->child_appinfos;
+ }
+
+ i = 0;
foreach(l, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
@@ -229,6 +246,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
(void) build_simple_rel(root, appinfo->child_relid,
rel);
+
+ if (child_appinfos)
+ child_appinfos[i++] = appinfo;
}
}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2283c675e9..fd16494909 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -99,4 +99,12 @@ extern int get_partition_for_tuple(PartitionDispatch *pd,
EState *estate,
PartitionDispatchData **failed_at,
TupleTableSlot **failed_slot);
+
+/* Planner support stuff. */
+extern List *get_partitions_for_keys(Relation rel,
+ NullTestType *keynullness,
+ Datum *minkeys, int n_minkeys, bool min_inclusive,
+ Datum *maxkeys, int n_maxkeys, bool max_inclusive,
+ int *min_datum_index, int *max_datum_index,
+ bool *null_partition_chosen);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 27bd4f3363..63196a1211 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
T_SpecialJoinInfo,
T_AppendRelInfo,
T_PartitionedChildRelInfo,
+ T_PartitionAppendInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a39e59d8ac..2b535984a7 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -266,6 +266,8 @@ typedef struct PlannerInfo
List *distinct_pathkeys; /* distinctClause pathkeys, if any */
List *sort_pathkeys; /* sortClause pathkeys, if any */
+ List *partition_schemes; /* List of PartitionScheme objects. */
+
List *initial_rels; /* RelOptInfos we are now trying to join */
/* Use fetch_upper_rel() to get any particular upper rel */
@@ -326,6 +328,48 @@ typedef struct PlannerInfo
((root)->simple_rte_array ? (root)->simple_rte_array[rti] : \
rt_fetch(rti, (root)->parse->rtable))
+/*
+ * Partitioning scheme
+ * Structure to hold partitioning scheme for a given relation.
+ *
+ * Multiple relations may be partitioned in the same way. The relations
+ * resulting from joining such relations may be partitioned in the same way as
+ * the joining relations. Similarly, relations derived from such relations by
+ * grouping, sorting may be partitioned in the same way as the underlying scan
+ * relations. All such relations partitioned in the same way share the
+ * partitioning scheme.
+ *
+ * PlannerInfo stores a list of distinct "canonical" partitioning schemes.
+ * RelOptInfo of a partitioned relation holds the pointer to "canonical"
+ * partitioning scheme.
+ *
+ * We store opclass declared input data types instead of partition key
+ * datatypes since those are the ones used to compare partition bounds instead
+ * of actual partition key data types. Since partition key data types and the
+ * opclass declared input data types are expected to be binary compatible (per
+ * ResolveOpClass()), both of those should have same byval and length
+ * properties.
+ *
+ * The structure caches information about partition key data type to be used
+ * while matching partition bounds. While comparing partition schemes we don't
+ * need to compare this information as it should be same when opclass declared
+ * input data types are same for two partitioned relations.
+ */
+typedef struct PartitionSchemeData
+{
+ char strategy; /* Partitioning strategy */
+ int16 partnatts; /* Number of partitioning attributes */
+
+ /* The following arrays each have partnatts members. */
+ Oid *parttypid; /* Type OIDs */
+ int32 *parttypmod; /* Typemod values */
+ Oid *partcollation; /* Partitioning collation */
+ Oid *partopfamily; /* Operator family OIDs */
+ Oid *partopcintype; /* Operator class-declared input type OIDs */
+} PartitionSchemeData;
+
+typedef struct PartitionSchemeData *PartitionScheme;
+
/*----------
* RelOptInfo
@@ -515,6 +559,9 @@ typedef enum RelOptKind
/* Is the given relation an "other" relation? */
#define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
+typedef struct AppendRelInfo AppendRelInfo;
+typedef struct PartitionAppendInfo PartitionAppendInfo;
+
typedef struct RelOptInfo
{
NodeTag type;
@@ -592,6 +639,48 @@ typedef struct RelOptInfo
/* used by "other" relations */
Relids top_parent_relids; /* Relids of topmost parents */
+
+ /* Fields set for partitioned relations */
+
+ /*
+ * Information about the partitioning attributes, such as the number of
+ * attributes, arrays containing per-attribute type/tpymod, partitioning
+ * collation, operator family OIDs, etc.
+ */
+ PartitionScheme part_scheme;
+
+ /*
+ * Following contains the exact identities of the individual partitioning
+ * attributes. For example, if the attribute is a table's column, then
+ * it will be represented herein by a Var node for the same. This is
+ * structured as an array of Lists with part_scheme->partnatts members,
+ * with each list containing the expression(s) corresponding to the ith
+ * partitioning attribute (0 <= i < part_schem->partnatts) of this rel.
+ * For baserels, there is just a single expression in each slot (the ith
+ * list) of the array, because it corresponds to just one table. But for
+ * a joinrel, there will be as many expressions as there are tables
+ * involved in that joinrel. We have to do it that way, because in the
+ * joinrel case, the same corresponding partitioning attribute may have
+ * different identities in different tables involved in the join; for
+ * example, a Var node's varno will differ and so might varattnos.
+ */
+ List **partexprs;
+
+ /* AppendRelInfos of *all* partitions of the table. */
+ AppendRelInfo **child_appinfos;
+
+ /*
+ * For a partitioned relation, the following represents the identities
+ * of its live partition (their RT indexes) and some informations about
+ * the bounds that the live partitions satisfy.
+ */
+ PartitionAppendInfo *painfo;
+
+ /*
+ * RT index of the root partitioned table in the the partition tree of
+ * which this rel is a member.
+ */
+ Index root_parent_relid;
} RelOptInfo;
/*
@@ -2031,6 +2120,52 @@ typedef struct PartitionedChildRelInfo
List *child_rels;
} PartitionedChildRelInfo;
+/* Forward declarations, to avoid including other headers */
+typedef struct PartitionDispatchData *PartitionDispatch;
+typedef struct PartitionBoundInfoData *PartitionBoundInfo;
+typedef struct PartitionKeyData *PartitionKey;
+
+/*
+ * PartitionAppendInfo - Properties of partitions contained in the Append path
+ * of a given partitioned table
+ */
+typedef struct PartitionAppendInfo
+{
+ NodeTag type;
+
+ /*
+ * List of AppendRelInfos of the table's partitions that satisfy a given
+ * query.
+ */
+ List *live_partition_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
+
+ /*
+ * The following simply copies the pointer to boundinfo in the table's
+ * PartitionDesc.
+ */
+ PartitionBoundInfo boundinfo;
+
+ /*
+ * Indexes in the boundinfo->datums array of the smallest and the largest
+ * value of the partition key that the query allows. They are set by
+ * calling get_partitions_for_keys().
+ */
+ int min_datum_idx;
+ int max_datum_idx;
+
+ /*
+ * Does this Append contain the null-accepting partition, if one exists
+ * and is allowed by the query's quals.
+ */
+ bool contains_null_partition;
+} PartitionAppendInfo;
+
/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 71f0faf938..c45db074c6 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -56,5 +56,7 @@ extern Selectivity join_selectivity(PlannerInfo *root,
SpecialJoinInfo *sjinfo);
extern bool has_row_triggers(PlannerInfo *root, Index rti, CmdType event);
+extern PartitionScheme find_partition_scheme(PlannerInfo *root,
+ Relation relation);
#endif /* PLANCAT_H */
--
2.11.0
0003-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patchtext/plain; charset=UTF-8; name=0003-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patchDownload
From 1921fc38dee9bd89f42f89012dd1d57ead4dc951 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 123 +++++++++++++++++++++++++++++-----------
1 file changed, 90 insertions(+), 33 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index bb3009e5b3..14876f8ea3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -105,6 +105,30 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the user-defined
+ * partition bound of a given existing partition, while an instance of the
+ * following struct describes either a new partition bound being compared
+ * against existing bounds (is_bound is true in that case and either lbound
+ * or rbound is set), or a new tuple's partition key specified in datums
+ * (ndatums = number of partition key columns).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -131,14 +155,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
/*
* RelationBuildPartitionDesc
@@ -663,10 +688,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -717,6 +748,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo && boundinfo->ndatums > 0 &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE);
@@ -736,8 +768,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -751,9 +786,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2037,9 +2072,14 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
/* Else bsearch in partdesc->boundinfo */
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key, partdesc->boundinfo,
- values, false, &equal);
+ &arg, &equal);
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
@@ -2237,12 +2277,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2264,11 +2304,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2276,17 +2316,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2297,12 +2355,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2316,20 +2375,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2342,8 +2400,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0004-WIP-Implement-get_partitions_for_keys.patchtext/plain; charset=UTF-8; name=0004-WIP-Implement-get_partitions_for_keys.patchDownload
From 240d2d65fbb4af5959df8e2e1ae576bf46440d4d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 4/5] WIP: Implement get_partitions_for_keys()
Disable constraint exclusion that occurs using internal partition
constraints, so that it's apparent what the new partition-pruning
code still needs to do to able to create a plan matching the plain
the the traditional constraint exclusion based partition-pruning
would result in.
---
src/backend/catalog/partition.c | 193 ++++++++++++++++++++++++++++++++++-
src/backend/optimizer/util/plancat.c | 4 +
2 files changed, 192 insertions(+), 5 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 14876f8ea3..617900f62f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1194,8 +1194,6 @@ RelationGetPartitionDispatchInfo(Relation rel,
* datum that the query's bounding keys allow to be returned for the query.
* Similarly, *max_datum_index. *null_partition_chosen returns whether
* the null partition will be scanned.
- *
- * TODO: Implement.
*/
List *
get_partitions_for_keys(Relation rel,
@@ -1205,12 +1203,197 @@ get_partitions_for_keys(Relation rel,
int *min_datum_index, int *max_datum_index,
bool *null_partition_chosen)
{
+ int i,
+ minoff,
+ maxoff;
List *result = NIL;
- int i;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundCmpArg arg;
+ bool is_equal;
+ int null_partition_idx = partdesc->boundinfo->null_index;
+
+ *null_partition_chosen = false;
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if partdesc->boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keynullness[i] == IS_NULL)
+ {
+ if (null_partition_idx >= 0)
+ {
+ *null_partition_chosen = true;
+ result = list_make1_int(null_partition_idx);
+ }
+ else
+ result = NIL;
+
+ return result;
+ }
+ }
+
+ if (n_minkeys > 0 && partdesc->nparts > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = minkeys;
+ arg.ndatums = n_minkeys;
+ minoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+ &arg, &is_equal);
+
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (min_inclusive)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 ||
+ minoff >= partdesc->boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, partdesc->boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found, but if the query may have asked us to exclude it.
+ */
+ if (is_equal && !min_inclusive)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Records returned by the query will be > bounds[minoff],
+ * because min_scankey is >= bounds[minoff], that is, no
+ * records of the partition at minoff will be returned. Go
+ * to the next bound.
+ */
+ if (minoff < partdesc->boundinfo->ndatums - 1)
+ minoff += 1;
+
+ /*
+ * Make sure to skip a gap.
+ * Note: There are ndatums + 1 lots in the indexes array.
+ */
+ if (partdesc->boundinfo->indexes[minoff] < 0 &&
+ partdesc->boundinfo->indexes[minoff + 1] >= 0)
+ minoff += 1;
+ break;
+ }
+ }
+ else
+ minoff = 0;
+
+ if (n_maxkeys > 0 && partdesc->nparts > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = maxkeys;
+ arg.ndatums = n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+ &arg, &is_equal);
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
- for (i = 0; i < partdesc->nparts; i++)
- result = lappend_int(result, i);
+ is_equal = false;
+
+ do
+ {
+ if (max_inclusive)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 ||
+ maxoff >= partdesc->boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, partdesc->boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (max_inclusive)
+ maxoff -= 1;
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found, but if the query may have asked us to exclude it.
+ */
+ if (is_equal && !max_inclusive)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Because bounds[maxoff] <= max_scankey, we may need to
+ * to consider the next partition as well, in addition to
+ * the partition at maxoff and earlier.
+ */
+ if (!is_equal || max_inclusive)
+ maxoff += 1;
+
+ /* Make sure to skip a gap. */
+ if (partdesc->boundinfo->indexes[maxoff] < 0 && maxoff >= 1)
+ maxoff -= 1;
+ break;
+ }
+ }
+ else
+ maxoff = partdesc->boundinfo->ndatums - 1;
+
+ *min_datum_index = minoff;
+ *max_datum_index = maxoff;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ for (i = minoff; i <= maxoff; i++)
+ {
+ int partition_idx = partdesc->boundinfo->indexes[i];
+
+ /*
+ * Multiple values may belong to the same partition, so make
+ * sure we don't add the same partition index again.
+ */
+ result = list_append_unique_int(result, partition_idx);
+ }
+
+ /* If no bounding keys exist, include the null partition too. */
+ if (null_partition_idx >= 0 &&
+ (keynullness[0] == -1 || keynullness[0] != IS_NOT_NULL))
+ {
+ *null_partition_chosen = true;
+ result = list_append_unique_int(result, null_partition_idx);
+ }
+
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ for (i = minoff; i <= maxoff; i++)
+ {
+ int partition_idx = partdesc->boundinfo->indexes[i];
+
+ if (partition_idx >= 0)
+ result = lappend_int(result, partition_idx);
+ }
+ break;
+ }
return result;
}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index de50b5d86a..ec51d89caa 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1155,7 +1155,9 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
List *pcqual;
+#endif
/*
* We assume the relation has already been safely locked.
@@ -1241,6 +1243,7 @@ get_relation_constraints(PlannerInfo *root,
}
}
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
/* Append partition predicates, if any */
pcqual = RelationGetPartitionQual(relation);
if (pcqual)
@@ -1258,6 +1261,7 @@ get_relation_constraints(PlannerInfo *root,
result = list_concat(result, pcqual);
}
+#endif
heap_close(relation, NoLock);
--
2.11.0
0005-Add-more-tests-for-the-new-partitioning-related-plan.patchtext/plain; charset=UTF-8; name=0005-Add-more-tests-for-the-new-partitioning-related-plan.patchDownload
From a8808856cb8a6e4553c72b8c0418a69b2bb1aa47 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 5/5] Add more tests for the new partitioning-related planning
code
---
src/test/regress/expected/partition.out | 449 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 80 ++++++
4 files changed, 531 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..ae5f59e3a6
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,449 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_null
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(3 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+(7 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+(5 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5
+ Filter: ((a)::numeric = '1'::numeric)
+(15 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+(5 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp4
+ Filter: (a > 10)
+ -> Seq Scan on rlp5
+ Filter: (a > 10)
+(11 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+(5 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+(11 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(7 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlp5
+ Filter: (a > 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 30; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5
+ Filter: (a <= 31)
+(15 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(9 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+(15 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+(5 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+(7 rows)
+
+-- XXX - '<>' clauses cannot be handled yet
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+(7 rows)
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p3
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p4
+ Filter: ((a > 1) AND (a = 10))
+(11 rows)
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef';
+ QUERY PLAN
+------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+ -> Seq Scan on rlp3efgh
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+(7 rows)
+
+drop table lp, rlp, mc3p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 2fd3f2b1b1..2eb81fcf41 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 76b0de30a7..6611662149 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..423ac3726f
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,80 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* empty */
+explain (costs off) select * from rlp where a <= 31;
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+
+-- XXX - '<>' clauses cannot be handled yet
+explain (costs off) select * from lp where a <> 'a';
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef';
+
+drop table lp, rlp, mc3p;
--
2.11.0
Forgot to mention a couple of important points about the relation of some
of the patches here to the patches and discussion at the
partitionwise-join thread [1]/messages/by-id/CAFjFpRfRDhWp=oguNjyzN=NMoOD+RCC3wS+b+xbGKwKUk0dRKg@mail.gmail.com.
On 2017/09/06 19:38, Amit Langote wrote:
[PATCH 1/5] Expand partitioned inheritance in a non-flattened manner
This will allow us to perform scan and join planning in a per partition
sub-tree manner, with each sub-tree's root getting its own RelOptInfo.
Previously, only the root of the whole partition tree would get a
RelOptInfo, along with the leaf partitions, with each leaf partition's
AppendRelInfo pointing to the former as its parent.This is essential, because we will be doing partition-pruning for every
partitioned table in the tree by matching query's scan keys with its
partition key. We won't be able to do that if the intermediate
partitioned tables didn't have a RelOptInfo.
There is a patch in the Ashutosh's posted series of patches, which does
more or less the same thing that this patch does. He included it in his
series of patches, because, IIUC, the main partitionwise-join planning
logic that one of the later patch implements depends on being able to
consider applying that new planning technique individually for every
partition sub-tree, instead of just at the whole tree root.
One notable difference from his patch is that while his patch will expand
in non-flattened manner even in the case where the parent is the result
relation of a query, my patch doesn't in that case, because the new
partition-pruning technique cannot be applied to inheritance parent that
is a result relation, for example,
update partitioned_table set ...
And AFAICS, partitionwise-join cannot be applied to such a parent either.
Note however that if there are other instances of the same partitioned
table (in the FROM list of an update statement) or other partitioned
tables in the query, they will be expanded in a non-flattened manner,
because they are themselves not the result relations of the query. So,
the new partition-pruning and (supposedly) partitionwise-join can be
applied for those other partitioned tables.
[PATCH 2/5] WIP: planner-side changes for partition-pruni[...]
Also, it adds certain fields to RelOptInfos of the partitioned tables that
reflect its partitioning properties.
There is something called PartitionScheme, which is a notion one of the
Ashutosh's patches invented that this patch incorporates as one of the new
fields in RelOptInfo that I mentioned above (also a list of
PartitionScheme's in the planner-global PlannerInfo). Although,
PartitionScheme is not significant for the task of partition-pruning
itself, it's still useful. On Ashutosh's suggestion, I adopted the same
in my patch series, so that the partition-wise join patch doesn't end up
conflicting with the partition-pruning patch while trying to implement the
same and can get straight to the task of implementing partition-wise joins.
The same patch in the partition-wise join patch series that introduces
PartitionScheme, also introduces a field in the RelOptInfo called
partexprs, which records the partition key expressions. Since,
partition-pruning has use for the same, I incorporated the same here;
also, in a format that Ashutosh's partition-wise patch can use directly,
instead of the latter having to hack it again to make it suitable to store
partition key expressions of joinrels. Again, that's to minimize
conflicts and let his patch just find the field to use as is, instead of
implementing it first.
Lastly, this patch introduces a PartitionAppendInfo in a partitioned
table's RelOptInfo that stores AppendRelInfos of the partitions (child
relations) that survive partition-pruning, which serves to identify those
partitions' RelOptInfos. Along with the identities of surviving
partitions, it also stores the partitioning configuration of the
partitioned table after partitions are pruned. That includes
partdesc->boundinfo (which is simply a pointer into the table's relcache)
and a few other fields that are set by partition-pruning code, such
min_datum_index, max_datum_index, null_partition_chosen, that describe the
result after pruning. So, for two partitioned tables being joined, if the
boundinfos match per partition_bounds_equal() and these other fields
match, they can be safely partition-wise joined.
[1]: /messages/by-id/CAFjFpRfRDhWp=oguNjyzN=NMoOD+RCC3wS+b+xbGKwKUk0dRKg@mail.gmail.com
/messages/by-id/CAFjFpRfRDhWp=oguNjyzN=NMoOD+RCC3wS+b+xbGKwKUk0dRKg@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 7, 2017 at 7:16 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
There is a patch in the Ashutosh's posted series of patches, which does
more or less the same thing that this patch does. He included it in his
series of patches, because, IIUC, the main partitionwise-join planning
logic that one of the later patch implements depends on being able to
consider applying that new planning technique individually for every
partition sub-tree, instead of just at the whole tree root.One notable difference from his patch is that while his patch will expand
in non-flattened manner even in the case where the parent is the result
relation of a query, my patch doesn't in that case, because the new
partition-pruning technique cannot be applied to inheritance parent that
is a result relation, for example,update partitioned_table set ...
And AFAICS, partitionwise-join cannot be applied to such a parent either.
Note however that if there are other instances of the same partitioned
table (in the FROM list of an update statement) or other partitioned
tables in the query, they will be expanded in a non-flattened manner,
because they are themselves not the result relations of the query. So,
the new partition-pruning and (supposedly) partitionwise-join can be
applied for those other partitioned tables.
It seems to me that it would be better not to write new patches for
things that already have patches without a really clear explanation
with what's wrong with the already-existing patch; I don't see any
such explanation here. Instead of writing your own patch for this to
duel with his his, why not review his and help him correct any
deficiencies which you can spot? Then we have one patch with more
review instead of two patches with less review both of which I have to
read and try to decide which is better.
In this case, I think Ashutosh has the right idea. I think that
handling the result-relation and non-result-relation differently
creates an unpleasant asymmetry. With your patch, we have to deal
with three cases: (a) partitioned tables that were expanded
level-by-level because they are not result relations, (b) partitioned
tables that were expanded "flattened" because they are result
relations, and (c) non-partitioned tables that were expanded
"flattened". With Ashutosh's approach, we only have two cases to
worry about in the future rather than three, and I like that better.
Your patch also appears to change things so that the table actually
referenced in the query ends up with an AppendRelInfo for the parent,
which seems pointless. And it has no tests.
There are a couple of hunks from your patch that we might want or need
to incorporate into Ashutosh's patch. The change to
relation_excluded_by_constraints() looks like it might be useful,
although it needs a better comment and some tests. Also, Ashutosh's
patch has no equivalent of your change to add_paths_to_append_rel().
I'm not clear what the code you've introduced there is supposed to be
doing, and I'm suspicious that it is confusing "partition root" with
"table named in the query", which will often be the same but not
always; the user could have named an intermediate partition. Can you
expand on what this is doing?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/08 4:41, Robert Haas wrote:
On Thu, Sep 7, 2017 at 7:16 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:There is a patch in the Ashutosh's posted series of patches, which does
more or less the same thing that this patch does. He included it in his
series of patches, because, IIUC, the main partitionwise-join planning
logic that one of the later patch implements depends on being able to
consider applying that new planning technique individually for every
partition sub-tree, instead of just at the whole tree root.One notable difference from his patch is that while his patch will expand
in non-flattened manner even in the case where the parent is the result
relation of a query, my patch doesn't in that case, because the new
partition-pruning technique cannot be applied to inheritance parent that
is a result relation, for example,update partitioned_table set ...
And AFAICS, partitionwise-join cannot be applied to such a parent either.
Note however that if there are other instances of the same partitioned
table (in the FROM list of an update statement) or other partitioned
tables in the query, they will be expanded in a non-flattened manner,
because they are themselves not the result relations of the query. So,
the new partition-pruning and (supposedly) partitionwise-join can be
applied for those other partitioned tables.It seems to me that it would be better not to write new patches for
things that already have patches without a really clear explanation
with what's wrong with the already-existing patch; I don't see any
such explanation here. Instead of writing your own patch for this to
duel with his his, why not review his and help him correct any
deficiencies which you can spot? Then we have one patch with more
review instead of two patches with less review both of which I have to
read and try to decide which is better.
Sorry, I think I should have just used the Ashutosh's patch.
In this case, I think Ashutosh has the right idea. I think that
handling the result-relation and non-result-relation differently
creates an unpleasant asymmetry. With your patch, we have to deal
with three cases: (a) partitioned tables that were expanded
level-by-level because they are not result relations, (b) partitioned
tables that were expanded "flattened" because they are result
relations, and (c) non-partitioned tables that were expanded
"flattened". With Ashutosh's approach, we only have two cases to
worry about in the future rather than three, and I like that better.
I tend to agree with this now.
Your patch also appears to change things so that the table actually
referenced in the query ends up with an AppendRelInfo for the parent,
which seems pointless.
Actually, it wouldn't, because my patch also got rid of the notion of
adding the duplicate RTE for original parent, because I thought the
duplicate RTE was pointless in the partitioning case.
There are a couple of hunks from your patch that we might want or need
to incorporate into Ashutosh's patch. The change to
relation_excluded_by_constraints() looks like it might be useful,
although it needs a better comment and some tests.
I think we could just drop that part from this patch. It also looks like
Ashutosh has a patch elsewhere concerning this.
https://commitfest.postgresql.org/14/1108/
Maybe, we could discuss what do about this on that thread. Now that not
only the root partitioned table, but also other partitioned tables in the
tree get an RTE with inh = true, I think it would be interesting to
consider his patch.
Also, Ashutosh's
patch has no equivalent of your change to add_paths_to_append_rel().
I'm not clear what the code you've introduced there is supposed to be
doing, and I'm suspicious that it is confusing "partition root" with
"table named in the query", which will often be the same but not
always; the user could have named an intermediate partition. Can you
expand on what this is doing?
I've replied on the partition-wise thread explaining why changes in the
add_paths_to_append_rel() are necessary.
Anyway, I'm dropping my patch in favor of the patch on the other thread.
Sorry for the duplicated effort involved in having to look at both the
patches.
Thanks,
Amit
[1]: /messages/by-id/CA+TgmoZEUonD9dUZH1FBEyq=PEv_KvE3wC=A=0zm-_tRz_917A@mail.gmail.com
/messages/by-id/CA+TgmoZEUonD9dUZH1FBEyq=PEv_KvE3wC=A=0zm-_tRz_917A@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21 August 2017 at 18:37, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
I've been working on implementing a way to perform plan-time
partition-pruning that is hopefully faster than the current method of
using constraint exclusion to prune each of the potentially many
partitions one-by-one. It's not fully cooked yet though.
I'm interested in seeing improvements in this area, so I've put my
name down to review this.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/15 10:55, David Rowley wrote:
On 21 August 2017 at 18:37, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
I've been working on implementing a way to perform plan-time
partition-pruning that is hopefully faster than the current method of
using constraint exclusion to prune each of the potentially many
partitions one-by-one. It's not fully cooked yet though.I'm interested in seeing improvements in this area, so I've put my
name down to review this.
Great, thanks!
I will post rebased patches later today, although I think the overall
design of the patch on the planner side of things is not quite there yet.
Of course, your and others' feedback is greatly welcome.
Also, I must inform to all of those who're looking at this thread that I
won't be able to respond to emails from tomorrow (9/16, Sat) until 9/23,
Sat, due to some personal business.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 6, 2017 at 4:08 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/09/04 10:10, Amit Langote wrote:
On 2017/09/02 2:52, Robert Haas wrote:
[PATCH 2/5] WIP: planner-side changes for partition-pruning
This patch adds a stub get_partitions_for_keys in partition.c with a
suitable interface for the caller to pass bounding keys extracted from the
query and other related information.Importantly, it implements the logic within the planner to match query's
scan keys to a parent table's partition key and form the bounding keys
that will be passed to partition.c to compute the list of partitions that
satisfy those bounds.
+ Node *leftop = get_leftop(clause);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
It appears that this patch always assume that clause will be of form
"var op const", but it can also be "const op var"
That's the reason in below example where in both the queries condition
is same it can only prune in the first case but not in the second.
postgres=# explain select * from t where t.a < 2;
QUERY PLAN
--------------------------------------------------------
Append (cost=0.00..2.24 rows=1 width=8)
-> Seq Scan on t1 (cost=0.00..2.24 rows=1 width=8)
Filter: (a < 2)
(3 rows)
postgres=# explain select * from t where 2>t.a;
QUERY PLAN
--------------------------------------------------------
Append (cost=0.00..4.49 rows=2 width=8)
-> Seq Scan on t1 (cost=0.00..2.24 rows=1 width=8)
Filter: (2 > a)
-> Seq Scan on t2 (cost=0.00..2.25 rows=1 width=8)
Filter: (2 > a)
(5 rows)
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Dilip,
Thanks for looking at the patch.
On 2017/09/15 13:43, Dilip Kumar wrote:
On Wed, Sep 6, 2017 at 4:08 PM, Amit Langote
[PATCH 2/5] WIP: planner-side changes for partition-pruning
This patch adds a stub get_partitions_for_keys in partition.c with a
suitable interface for the caller to pass bounding keys extracted from the
query and other related information.Importantly, it implements the logic within the planner to match query's
scan keys to a parent table's partition key and form the bounding keys
that will be passed to partition.c to compute the list of partitions that
satisfy those bounds.+ Node *leftop = get_leftop(clause); + + if (IsA(leftop, RelabelType)) + leftop = (Node *) ((RelabelType *) leftop)->arg; + + if (equal(leftop, partkey))It appears that this patch always assume that clause will be of form
"var op const", but it can also be "const op var"That's the reason in below example where in both the queries condition
is same it can only prune in the first case but not in the second.postgres=# explain select * from t where t.a < 2;
QUERY PLAN
--------------------------------------------------------
Append (cost=0.00..2.24 rows=1 width=8)
-> Seq Scan on t1 (cost=0.00..2.24 rows=1 width=8)
Filter: (a < 2)
(3 rows)postgres=# explain select * from t where 2>t.a;
QUERY PLAN
--------------------------------------------------------
Append (cost=0.00..4.49 rows=2 width=8)
-> Seq Scan on t1 (cost=0.00..2.24 rows=1 width=8)
Filter: (2 > a)
-> Seq Scan on t2 (cost=0.00..2.25 rows=1 width=8)
Filter: (2 > a)
(5 rows)
Yeah, there are a bunch of smarts still missing in that patch as it is.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/15 11:16, Amit Langote wrote:
I will post rebased patches later today, although I think the overall
design of the patch on the planner side of things is not quite there yet.
Of course, your and others' feedback is greatly welcome.
Rebased patches attached. Because Dilip complained earlier today about
clauses of the form (const op var) not causing partition-pruning, I've
added code to commute the clause where it is required. Some other
previously mentioned limitations remain -- no handling of OR clauses, no
elimination of redundant clauses for given partitioning column, etc.
A note about 0001: this patch overlaps with
0003-Canonical-partition-scheme.patch from the partitionwise-join patch
series that Ashutosh Bapat posted yesterday [1]/messages/by-id/CAFiTN-skmaqeCVaoAHCBqe2DyfO3f6sgdtEjHWrUgi0kV1yPLQ@mail.gmail.com. Because I implemented
the planner-portion of this patch based on what 0001 builds, I'm posting
it here. It might actually turn out that we will review and commit
0003-Canonical-partition-scheme.patch on that thread, but meanwhile apply
0001 if you want to play with the later patches. I would certainly like
to review 0003-Canonical-partition-scheme.patch myself, but won't be able
to immediately (see below).
Also, I must inform to all of those who're looking at this thread that I
won't be able to respond to emails from tomorrow (9/16, Sat) until 9/23,
Sat, due to some personal business.
To remind.
Thanks,
Amit
[1]: /messages/by-id/CAFiTN-skmaqeCVaoAHCBqe2DyfO3f6sgdtEjHWrUgi0kV1yPLQ@mail.gmail.com
/messages/by-id/CAFiTN-skmaqeCVaoAHCBqe2DyfO3f6sgdtEjHWrUgi0kV1yPLQ@mail.gmail.com
Attachments:
0001-Some-optimizer-data-structures-for-partitioned-rels.patchtext/plain; charset=UTF-8; name=0001-Some-optimizer-data-structures-for-partitioned-rels.patchDownload
From ff9ccd8df6555cfca31e54e22293ac1613db327c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 1/5] Some optimizer data structures for partitioned rels
Nobody uses it though.
---
src/backend/optimizer/util/plancat.c | 120 +++++++++++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 20 ++++++
src/include/nodes/nodes.h | 1 +
src/include/nodes/relation.h | 135 +++++++++++++++++++++++++++++++++++
src/include/optimizer/plancat.h | 2 +
5 files changed, 278 insertions(+)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index a1ebd4acc8..f7e3a1df5f 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -68,6 +68,8 @@ static List *get_relation_constraints(PlannerInfo *root,
static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
Relation heapRelation);
static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
+static void get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+ Relation relation);
/*
* get_relation_info -
@@ -420,6 +422,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
/* Collect info about relation's foreign keys, if relevant */
get_relation_foreign_keys(root, rel, relation, inhparent);
+ /* Collect partitioning info, if relevant. */
+ if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ get_relation_partition_info(root, rel, relation);
+
heap_close(relation, NoLock);
/*
@@ -1802,3 +1808,117 @@ has_row_triggers(PlannerInfo *root, Index rti, CmdType event)
heap_close(relation, NoLock);
return result;
}
+
+static void
+get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+ Relation relation)
+{
+ int i;
+ ListCell *l;
+ PartitionKey key = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ rel->part_scheme = find_partition_scheme(root, relation);
+ rel->partexprs = (List **) palloc0(key->partnatts * sizeof(List *));
+
+ l = list_head(key->partexprs);
+ for (i = 0; i < key->partnatts; i++)
+ {
+ Expr *keyCol;
+
+ if (key->partattrs[i] != 0)
+ {
+ keyCol = (Expr *) makeVar(rel->relid,
+ key->partattrs[i],
+ key->parttypid[i],
+ key->parttypmod[i],
+ key->parttypcoll[i],
+ 0);
+ }
+ else
+ {
+ if (l == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+ keyCol = (Expr *) copyObject(lfirst(l));
+ l = lnext(l);
+ }
+
+ rel->partexprs[i] = list_make1(keyCol);
+ }
+
+ /* Values are filled in build_simple_rel(). */
+ rel->child_appinfos = (AppendRelInfo **) palloc0(partdesc->nparts *
+ sizeof(AppendRelInfo *));
+
+ /*
+ * A PartitionAppendInfo to map this table to its immediate partitions
+ * that will be scanned by this query. At the same time, it records the
+ * table's partitioning properties reflecting any partition-pruning that
+ * might occur to satisfy the query. Rest of the fields are set in
+ * get_rel_partitions() and set_append_rel_size().
+ */
+ rel->painfo = makeNode(PartitionAppendInfo);
+ rel->painfo->boundinfo = partdesc->boundinfo;
+}
+
+/*
+ * find_partition_scheme
+ *
+ * The function returns a canonical partition scheme which exactly matches the
+ * partitioning scheme of the given relation if one exists in the list of
+ * canonical partitioning schemes maintained in PlannerInfo. If none of the
+ * existing partitioning schemes match, the function creates a canonical
+ * partition scheme and adds it to the list.
+ *
+ * For an unpartitioned table or for a multi-level partitioned table it returns
+ * NULL. See comments in the function for more details.
+ */
+PartitionScheme
+find_partition_scheme(PlannerInfo *root, Relation relation)
+{
+ ListCell *lc;
+ PartitionKey key = RelationGetPartitionKey(relation);
+ char strategy = key->strategy;
+ int partnatts = key->partnatts;
+ PartitionScheme part_scheme = NULL;
+
+ /* Search for a matching partition scheme and return if found one. */
+ foreach(lc, root->partition_schemes)
+ {
+ part_scheme = lfirst(lc);
+
+ /* Match various partitioning attributes. */
+ if (strategy != part_scheme->strategy ||
+ partnatts != part_scheme->partnatts ||
+ memcmp(key->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(key->parttypmod, part_scheme->parttypmod,
+ sizeof(int32) * partnatts) != 0 ||
+ memcmp(key->partcollation, part_scheme->partcollation,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(key->partopfamily, part_scheme->partopfamily,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(key->partopcintype, part_scheme->partopcintype,
+ sizeof(Oid) * partnatts) != 0)
+ continue;
+
+ /* Found a matching partition scheme. */
+ return part_scheme;
+ }
+
+ /* Did not find matching partition scheme. Create one. */
+ part_scheme = (PartitionScheme) palloc0(sizeof(PartitionSchemeData));
+
+ part_scheme->strategy = strategy;
+ part_scheme->partnatts = partnatts;
+ part_scheme->parttypid = key->parttypid;
+ part_scheme->parttypmod = key->parttypmod;
+ part_scheme->partcollation = key->partcollation;
+ part_scheme->partopfamily = key->partopfamily;
+ part_scheme->partopcintype = key->partopcintype;
+
+ /* Add the partitioning scheme to PlannerInfo. */
+ root->partition_schemes = lappend(root->partition_schemes, part_scheme);
+
+ return part_scheme;
+}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c7b2695ebb..f0973b83b9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -17,6 +17,7 @@
#include <limits.h>
#include "miscadmin.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -163,6 +164,11 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
else
rel->top_parent_relids = NULL;
+ rel->child_appinfos = NULL;
+ rel->part_scheme = NULL;
+ rel->partexprs = NULL;
+ rel->painfo = NULL;
+
/* Check type of rtable entry */
switch (rte->rtekind)
{
@@ -218,7 +224,18 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
if (rte->inh)
{
ListCell *l;
+ AppendRelInfo **child_appinfos = NULL;
+ int i;
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->child_appinfos != NULL);
+ Assert(rel->painfo != NULL);
+ child_appinfos = rel->child_appinfos;
+ }
+
+ i = 0;
foreach(l, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
@@ -229,6 +246,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
(void) build_simple_rel(root, appinfo->child_relid,
rel);
+
+ if (child_appinfos)
+ child_appinfos[i++] = appinfo;
}
}
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 27bd4f3363..63196a1211 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
T_SpecialJoinInfo,
T_AppendRelInfo,
T_PartitionedChildRelInfo,
+ T_PartitionAppendInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d50ff55681..0f4996b424 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -266,6 +266,8 @@ typedef struct PlannerInfo
List *distinct_pathkeys; /* distinctClause pathkeys, if any */
List *sort_pathkeys; /* sortClause pathkeys, if any */
+ List *partition_schemes; /* List of PartitionScheme objects. */
+
List *initial_rels; /* RelOptInfos we are now trying to join */
/* Use fetch_upper_rel() to get any particular upper rel */
@@ -326,6 +328,48 @@ typedef struct PlannerInfo
((root)->simple_rte_array ? (root)->simple_rte_array[rti] : \
rt_fetch(rti, (root)->parse->rtable))
+/*
+ * Partitioning scheme
+ * Structure to hold partitioning scheme for a given relation.
+ *
+ * Multiple relations may be partitioned in the same way. The relations
+ * resulting from joining such relations may be partitioned in the same way as
+ * the joining relations. Similarly, relations derived from such relations by
+ * grouping, sorting may be partitioned in the same way as the underlying scan
+ * relations. All such relations partitioned in the same way share the
+ * partitioning scheme.
+ *
+ * PlannerInfo stores a list of distinct "canonical" partitioning schemes.
+ * RelOptInfo of a partitioned relation holds the pointer to "canonical"
+ * partitioning scheme.
+ *
+ * We store opclass declared input data types instead of partition key
+ * datatypes since those are the ones used to compare partition bounds instead
+ * of actual partition key data types. Since partition key data types and the
+ * opclass declared input data types are expected to be binary compatible (per
+ * ResolveOpClass()), both of those should have same byval and length
+ * properties.
+ *
+ * The structure caches information about partition key data type to be used
+ * while matching partition bounds. While comparing partition schemes we don't
+ * need to compare this information as it should be same when opclass declared
+ * input data types are same for two partitioned relations.
+ */
+typedef struct PartitionSchemeData
+{
+ char strategy; /* Partitioning strategy */
+ int16 partnatts; /* Number of partitioning attributes */
+
+ /* The following arrays each have partnatts members. */
+ Oid *parttypid; /* Type OIDs */
+ int32 *parttypmod; /* Typemod values */
+ Oid *partcollation; /* Partitioning collation */
+ Oid *partopfamily; /* Operator family OIDs */
+ Oid *partopcintype; /* Operator class-declared input type OIDs */
+} PartitionSchemeData;
+
+typedef struct PartitionSchemeData *PartitionScheme;
+
/*----------
* RelOptInfo
@@ -515,6 +559,9 @@ typedef enum RelOptKind
/* Is the given relation an "other" relation? */
#define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
+typedef struct AppendRelInfo AppendRelInfo;
+typedef struct PartitionAppendInfo PartitionAppendInfo;
+
typedef struct RelOptInfo
{
NodeTag type;
@@ -592,6 +639,42 @@ typedef struct RelOptInfo
/* used by "other" relations */
Relids top_parent_relids; /* Relids of topmost parents */
+
+ /* Fields set for partitioned relations */
+
+ /*
+ * Information about the partitioning attributes, such as the number of
+ * attributes, arrays containing per-attribute type/tpymod, partitioning
+ * collation, operator family OIDs, etc.
+ */
+ PartitionScheme part_scheme;
+
+ /*
+ * Following contains the exact identities of the individual partitioning
+ * attributes. For example, if the attribute is a table's column, then
+ * it will be represented herein by a Var node for the same. This is
+ * structured as an array of Lists with part_scheme->partnatts members,
+ * with each list containing the expression(s) corresponding to the ith
+ * partitioning attribute (0 <= i < part_schem->partnatts) of this rel.
+ * For baserels, there is just a single expression in each slot (the ith
+ * list) of the array, because it corresponds to just one table. But for
+ * a joinrel, there will be as many expressions as there are tables
+ * involved in that joinrel. We have to do it that way, because in the
+ * joinrel case, the same corresponding partitioning attribute may have
+ * different identities in different tables involved in the join; for
+ * example, a Var node's varno will differ and so might varattnos.
+ */
+ List **partexprs;
+
+ /* AppendRelInfos of *all* partitions of the table. */
+ AppendRelInfo **child_appinfos;
+
+ /*
+ * For a partitioned relation, the following represents the identities
+ * of its live partition (their RT indexes) and some informations about
+ * the bounds that the live partitions satisfy.
+ */
+ PartitionAppendInfo *painfo;
} RelOptInfo;
/*
@@ -2031,6 +2114,58 @@ typedef struct PartitionedChildRelInfo
List *child_rels;
} PartitionedChildRelInfo;
+/* Forward declarations, to avoid including other headers */
+typedef struct PartitionDispatchData *PartitionDispatch;
+typedef struct PartitionBoundInfoData *PartitionBoundInfo;
+typedef struct PartitionKeyData *PartitionKey;
+
+/*
+ * PartitionAppendInfo - Properties of partitions contained in the Append path
+ * of a given partitioned table
+ */
+typedef struct PartitionAppendInfo
+{
+ NodeTag type;
+
+ /*
+ * List of AppendRelInfos of the table's partitions that satisfy a given
+ * query.
+ */
+ List *live_partition_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
+
+ /*
+ * The following simply copies the pointer to boundinfo in the table's
+ * PartitionDesc.
+ */
+ PartitionBoundInfo boundinfo;
+
+ /*
+ * Indexes in the boundinfo->datums array of the smallest and the largest
+ * value of the partition key that the query allows. They are set by
+ * calling get_partitions_for_keys().
+ */
+ int min_datum_idx;
+ int max_datum_idx;
+
+ /*
+ * Does this Append contain the null-accepting partition, if one exists
+ * and is allowed by the query's quals.
+ */
+ bool contains_null_partition;
+
+ /*
+ * Does this Append contain the default partition, if one exists and is
+ * allowed by the query's quals.
+ */
+ bool contains_default_partition;
+} PartitionAppendInfo;
+
/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 71f0faf938..c45db074c6 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -56,5 +56,7 @@ extern Selectivity join_selectivity(PlannerInfo *root,
SpecialJoinInfo *sjinfo);
extern bool has_row_triggers(PlannerInfo *root, Index rti, CmdType event);
+extern PartitionScheme find_partition_scheme(PlannerInfo *root,
+ Relation relation);
#endif /* PLANCAT_H */
--
2.11.0
0002-WIP-planner-side-changes-for-partition-pruning.patchtext/plain; charset=UTF-8; name=0002-WIP-planner-side-changes-for-partition-pruning.patchDownload
From 24bc6a3428730e24b04b2b1282960e45ffeb0467 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 17:31:42 +0900
Subject: [PATCH 2/5] WIP: planner-side changes for partition-pruning
Firstly, this adds a stub get_partitions_for_keys() in partition.c
with appropriate interface for the caller to specify bounding scan
keys, along with other information about the scan keys extracted
from the query, such as NULL-ness of the keys, inclusive-ness, etc.
More importantly, this implements the planner-side logic to extract
bounding scan keys to be passed to get_partitions_for_keys. That is,
it will go through rel->baserestrictinfo and match individual clauses
to partition keys and construct lower bound and upper bound tuples,
which may cover only a prefix of a multi-column partition key.
A bunch of smarts are still missing when mapping the clause operands
with keys. For example, code to match a clause is specifed as
(constant op var) doesn't exist. Also, redundant keys are not
eliminated, for example, a combination of clauses a = 10 and a > 1
will cause the later clause a > 1 taking over and resulting in
needless scanning of partitions containing values a > 1 and a < 10.
...constraint exclusion is still used, because
get_partitions_for_keys is just a stub...
---
src/backend/catalog/partition.c | 43 ++++
src/backend/optimizer/path/allpaths.c | 358 ++++++++++++++++++++++++++++++----
src/include/catalog/partition.h | 9 +
3 files changed, 371 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1ab6dba7ae..ccf8a1fa67 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1335,6 +1335,49 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_for_keys
+ * Returns the list of indexes of rel's partitions that will need to be
+ * scanned given the bounding scan keys.
+ *
+ * Each value in the returned list can be used as an index into the oids array
+ * of the partition descriptor.
+ *
+ * Inputs:
+ * keynullness contains between 0 and (key->partnatts - 1) values, each
+ * telling what kind of NullTest has been applies to the corresponding
+ * partition key column. minkeys represents the lower bound on the partition
+ * the key of the records that the query will return, while maxkeys
+ * represents upper bound. min_inclusive and max_inclusive tell whether the
+ * bounds specified minkeys and maxkeys is inclusive, respectively.
+ *
+ * Other outputs:
+ * *min_datum_index will return the index in boundinfo->datums of the first
+ * datum that the query's bounding keys allow to be returned for the query.
+ * Similarly, *max_datum_index. *null_partition_chosen returns whether
+ * the null partition will be scanned.
+ *
+ * TODO: Implement.
+ */
+List *
+get_partitions_for_keys(Relation rel,
+ NullTestType *keynullness,
+ Datum *minkeys, int n_minkeys, bool min_inclusive,
+ Datum *maxkeys, int n_maxkeys, bool max_inclusive,
+ int *min_datum_index, int *max_datum_index,
+ bool *null_partition_chosen,
+ bool *default_partition_chosen)
+{
+ List *result = NIL;
+ int i;
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+
+ for (i = 0; i < partdesc->nparts; i++)
+ result = lappend_int(result, i);
+
+ return result;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5b746a906a..6e5efe98f9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
@@ -846,6 +847,251 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the query clauses
+ *
+ * Returned list contains the AppendInfos of the chosen partitions.
+ */
+static List *
+get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *indexes;
+ List *result = NIL;
+ ListCell *lc1,
+ *lc2;
+ int keyPos;
+ List *matchedclauses[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ Datum minkeys[PARTITION_MAX_KEYS],
+ maxkeys[PARTITION_MAX_KEYS];
+ bool need_next_min,
+ need_next_max,
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_minkeys = 0,
+ n_maxkeys = 0,
+ i;
+
+ /*
+ * Match individual OpExprs in the query's restriction with individual
+ * partition key columns. There is one list per key.
+ */
+ memset(keynullness, -1, sizeof(keynullness));
+ memset(matchedclauses, 0, sizeof(matchedclauses));
+ keyPos = 0;
+ for (i = 0; i < rel->part_scheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+
+ foreach(lc2, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst(lc2);
+ Expr *clause = rinfo->clause;
+
+ if (is_opclause(clause))
+ {
+ Node *leftop = get_leftop(clause),
+ *rightop = get_rightop(clause);
+ Oid expr_op = ((OpExpr *) clause)->opno;
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = (Node *) ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ matchedclauses[keyPos] = lappend(matchedclauses[keyPos],
+ clause);
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[keyPos] = IS_NOT_NULL;
+ }
+ else if (equal(rightop, partkey))
+ {
+ Oid commutator = get_commutator(expr_op);
+
+ if (OidIsValid(commutator))
+ {
+ OpExpr *commutated_expr;
+
+ /*
+ * Generate a commutated copy of the expression, but
+ * try to make it look valid, because we only want
+ * it to put the constant operand in a place that the
+ * following code knows as the only place to find it.
+ */
+ commutated_expr = (OpExpr *) copyObject(clause);
+ commutated_expr->opno = commutator; /* really? */
+ commutated_expr->args = list_make2(rightop, leftop);
+ matchedclauses[keyPos] =
+ lappend(matchedclauses[keyPos],
+ commutated_expr);
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[keyPos] = IS_NOT_NULL;
+ }
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ keynullness[keyPos] = nulltest->nulltesttype;
+ }
+ }
+
+ /* Onto finding clauses matching the next partition key. */
+ keyPos++;
+ }
+
+ /*
+ * Determine the min keys and the max keys using btree semantics-based
+ * interpretation of the clauses' operators.
+ */
+
+ /*
+ * XXX - There should be a step similar to _bt_preprocess_keys() here,
+ * to eliminate any redundant scan keys for a given partition column. For
+ * example, among a <= 4 and a <= 5, we can only keep a <= 4 for being
+ * more restrictive and discard a <= 5. While doing that, we can also
+ * check to see if there exists a contradictory combination of scan keys
+ * that makes the query trivially false for all records in the table.
+ */
+ memset(minkeys, 0, sizeof(minkeys));
+ memset(maxkeys, 0, sizeof(maxkeys));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < rel->part_scheme->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc1, matchedclauses[i])
+ {
+ Expr *clause = lfirst(lc1);
+ Const *rightop = (Const *) get_rightop(clause);
+ Oid opno = ((OpExpr *) clause)->opno,
+ opfamily = rel->part_scheme->partopfamily[i];
+ StrategyNumber strategy;
+
+ strategy = get_op_opfamily_strategy(opno, opfamily);
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+ }
+ if (strategy == BTLessStrategyNumber)
+ need_next_max = false;
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkeys[i] = rightop->constvalue;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+ }
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_min = false;
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkeys[i] = rightop->constvalue;
+ if (!minkey_set[i])
+ n_minkeys++;
+ }
+ minkey_set[i] = true;
+ min_incl = true;
+
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ }
+ maxkey_set[i] = true;
+ max_incl = true;
+ break;
+
+ /*
+ * This might mean '<>', but we don't have anything for that
+ * case yet. Perhaps, handle that as key < const OR
+ * key > const, once we have props needed for handling OR
+ * clauses.
+ */
+ default:
+ min_incl = max_incl = false;
+ break;
+ }
+ }
+ }
+
+ /* Ask partition.c which partitions it thinks match the keys. */
+ indexes = get_partitions_for_keys(parent, keynullness,
+ minkeys, n_minkeys, min_incl,
+ maxkeys, n_maxkeys, max_incl,
+ &rel->painfo->min_datum_idx,
+ &rel->painfo->max_datum_idx,
+ &rel->painfo->contains_null_partition,
+ &rel->painfo->contains_default_partition);
+
+ if (indexes != NIL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ int first_index,
+ last_index;
+ first_index = linitial_int(indexes);
+ last_index = llast_int(indexes);
+ Assert(first_index <= last_index ||
+ rel->part_scheme->strategy != PARTITION_STRATEGY_RANGE);
+#endif
+
+ foreach(lc1, indexes)
+ {
+ int partidx = lfirst_int(lc1);
+ AppendRelInfo *appinfo = rel->child_appinfos[partidx];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+ Assert(partdesc->oids[partidx] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ }
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->painfo->live_partition_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1106,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1120,24 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_rel_partitions(root, rel, rte);
+ Assert(rel->painfo != NULL);
+ rel->painfo->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1158,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1171,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1118,6 +1379,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->painfo && rel->painfo)
+ {
+ rel->painfo->live_partitioned_rels =
+ list_concat(rel->painfo->live_partitioned_rels,
+ list_copy(childrel->painfo->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1213,14 +1485,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->painfo->live_partition_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1291,33 +1578,31 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
/*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append paths
- * will get flattened into the parent anyway. For a subquery RTE, no
- * PartitionedChildRelInfo exists; we collect all partitioned_rels
- * associated with any child. (This assumes that we don't need to look
- * through multiple levels of subquery RTEs; if we ever do, we could
- * create a PartitionedChildRelInfo with the accumulated list of
- * partitioned_rels which would then be found when populated our parent
- * rel with paths. For the present, that appears to be unnecessary.)
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
*/
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpcted rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->painfo->live_partitioned_rels;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1330,17 +1615,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->painfo)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->painfo->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 454a940a23..4f27d3018e 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -99,6 +99,7 @@ extern int get_partition_for_tuple(PartitionDispatch *pd,
EState *estate,
PartitionDispatchData **failed_at,
TupleTableSlot **failed_slot);
+
extern Oid get_default_oid_from_partdesc(PartitionDesc partdesc);
extern Oid get_default_partition_oid(Oid parentId);
extern void update_default_partition_oid(Oid parentId, Oid defaultPartId);
@@ -106,4 +107,12 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* Planner support stuff. */
+extern List *get_partitions_for_keys(Relation rel,
+ NullTestType *keynullness,
+ Datum *minkeys, int n_minkeys, bool min_inclusive,
+ Datum *maxkeys, int n_maxkeys, bool max_inclusive,
+ int *min_datum_index, int *max_datum_index,
+ bool *null_partition_chosen,
+ bool *default_partition_chosen);
#endif /* PARTITION_H */
--
2.11.0
0003-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patchtext/plain; charset=UTF-8; name=0003-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patchDownload
From a5725aca1168b6539ea591a886775a7f3d170e8d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index ccf8a1fa67..0133748234 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -111,6 +111,30 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the user-defined
+ * partition bound of a given existing partition, while an instance of the
+ * following struct describes either a new partition bound being compared
+ * against existing bounds (is_bound is true in that case and either lbound
+ * or rbound is set), or a new tuple's partition key specified in datums
+ * (ndatums = number of partition key columns).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -139,14 +163,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -755,10 +780,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -809,6 +840,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -830,8 +862,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -845,9 +880,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2380,12 +2415,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -2397,6 +2435,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -2427,12 +2466,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -2629,12 +2669,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2656,11 +2696,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2668,17 +2708,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2689,12 +2747,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2708,20 +2767,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2734,8 +2792,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0004-WIP-Implement-get_partitions_for_keys.patchtext/plain; charset=UTF-8; name=0004-WIP-Implement-get_partitions_for_keys.patchDownload
From 28778969ecddb2a9c3f31ff2ed119a342e97fb2d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 4/5] WIP: Implement get_partitions_for_keys()
Disable constraint exclusion that occurs using internal partition
constraints, so that it's apparent what the new partition-pruning
code still needs to do to able to create a plan matching the plain
the the traditional constraint exclusion based partition-pruning
would result in.
---
src/backend/catalog/partition.c | 210 ++++++++++++++++++++++++++++++++++-
src/backend/optimizer/util/plancat.c | 4 +
2 files changed, 209 insertions(+), 5 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 0133748234..9d4b7c1a7f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1391,8 +1391,6 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
* datum that the query's bounding keys allow to be returned for the query.
* Similarly, *max_datum_index. *null_partition_chosen returns whether
* the null partition will be scanned.
- *
- * TODO: Implement.
*/
List *
get_partitions_for_keys(Relation rel,
@@ -1403,12 +1401,214 @@ get_partitions_for_keys(Relation rel,
bool *null_partition_chosen,
bool *default_partition_chosen)
{
+ int i,
+ minoff,
+ maxoff;
List *result = NIL;
- int i;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundCmpArg arg;
+ bool is_equal,
+ scan_default = false;
+ int null_partition_idx = partdesc->boundinfo->null_index;
- for (i = 0; i < partdesc->nparts; i++)
- result = lappend_int(result, i);
+ *null_partition_chosen = false;
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if partdesc->boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keynullness[i] == IS_NULL)
+ {
+ if (null_partition_idx >= 0)
+ {
+ *null_partition_chosen = true;
+ result = list_make1_int(null_partition_idx);
+ }
+ else
+ result = NIL;
+
+ return result;
+ }
+ }
+
+ /*
+ * If query provides no quals, don't forget to scan the default partition.
+ */
+ if (n_minkeys == 0 && n_maxkeys == 0)
+ scan_default = true;
+
+ if (n_minkeys > 0 && partdesc->nparts > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = minkeys;
+ arg.ndatums = n_minkeys;
+ minoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+ &arg, &is_equal);
+
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (min_inclusive)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 ||
+ minoff >= partdesc->boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, partdesc->boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found, but if the query may have asked us to exclude it.
+ */
+ if (is_equal && !min_inclusive)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Records returned by the query will be > bounds[minoff],
+ * because min_scankey is >= bounds[minoff], that is, no
+ * records of the partition at minoff will be returned. Go
+ * to the next bound.
+ */
+ if (minoff < partdesc->boundinfo->ndatums - 1)
+ minoff += 1;
+
+ /*
+ * Make sure to skip a gap.
+ * Note: There are ndatums + 1 lots in the indexes array.
+ */
+ if (partdesc->boundinfo->indexes[minoff] < 0 &&
+ partdesc->boundinfo->indexes[minoff + 1] >= 0)
+ minoff += 1;
+ break;
+ }
+ }
+ else
+ minoff = 0;
+
+ if (n_maxkeys > 0 && partdesc->nparts > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = maxkeys;
+ arg.ndatums = n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+ &arg, &is_equal);
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (max_inclusive)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 ||
+ maxoff >= partdesc->boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, partdesc->boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (max_inclusive)
+ maxoff -= 1;
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found, but if the query may have asked us to exclude it.
+ */
+ if (is_equal && !max_inclusive)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Because bounds[maxoff] <= max_scankey, we may need to
+ * to consider the next partition as well, in addition to
+ * the partition at maxoff and earlier.
+ */
+ if (!is_equal || max_inclusive)
+ maxoff += 1;
+
+ /* Make sure to skip a gap. */
+ if (partdesc->boundinfo->indexes[maxoff] < 0 && maxoff >= 1)
+ maxoff -= 1;
+ break;
+ }
+ }
+ else
+ maxoff = partdesc->boundinfo->ndatums - 1;
+
+ *min_datum_index = minoff;
+ *max_datum_index = maxoff;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ for (i = minoff; i <= maxoff; i++)
+ {
+ int partition_idx = partdesc->boundinfo->indexes[i];
+
+ /*
+ * Multiple values may belong to the same partition, so make
+ * sure we don't add the same partition index again.
+ */
+ result = list_append_unique_int(result, partition_idx);
+ }
+
+ /* If no bounding keys exist, include the null partition too. */
+ if (null_partition_idx >= 0 &&
+ (keynullness[0] == -1 || keynullness[0] != IS_NOT_NULL))
+ {
+ *null_partition_chosen = true;
+ result = list_append_unique_int(result, null_partition_idx);
+ }
+
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ for (i = minoff; i <= maxoff; i++)
+ {
+ int partition_idx = partdesc->boundinfo->indexes[i];
+
+ /*
+ * If a valid partition exists for this range, add its
+ * index, if not, the default partition (if any) would be
+ * covering that range, so request to include the same.
+ */
+ if (partition_idx >= 0)
+ result = lappend_int(result, partition_idx);
+ else
+ scan_default = true;
+ }
+ break;
+ }
+
+ if (scan_default && partdesc->boundinfo->default_index >= 0)
+ result = lappend_int(result, partdesc->boundinfo->default_index);
return result;
}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f7e3a1df5f..26ea2b4162 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1155,7 +1155,9 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
List *pcqual;
+#endif
/*
* We assume the relation has already been safely locked.
@@ -1241,6 +1243,7 @@ get_relation_constraints(PlannerInfo *root,
}
}
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
/* Append partition predicates, if any */
pcqual = RelationGetPartitionQual(relation);
if (pcqual)
@@ -1258,6 +1261,7 @@ get_relation_constraints(PlannerInfo *root,
result = list_concat(result, pcqual);
}
+#endif
heap_close(relation, NoLock);
--
2.11.0
0005-Add-more-tests-for-the-new-partitioning-related-plan.patchtext/plain; charset=UTF-8; name=0005-Add-more-tests-for-the-new-partitioning-related-plan.patchDownload
From a8660afe1d147dcaa5ff46a8ed4faf366242d4d0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 5/5] Add more tests for the new partitioning-related planning
code
---
src/test/regress/expected/partition.out | 465 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 82 ++++++
4 files changed, 549 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..db92368fc5
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,465 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_null
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(3 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+(7 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+(5 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5
+ Filter: ((a)::numeric = '1'::numeric)
+(15 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+(5 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp4
+ Filter: (a > 10)
+ -> Seq Scan on rlp5
+ Filter: (a > 10)
+(11 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+(5 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+(11 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(7 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlp5
+ Filter: (a > 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 30; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5
+ Filter: (a <= 31)
+(15 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(9 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+(15 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+(5 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+(7 rows)
+
+-- XXX - '<>' clauses cannot be handled yet
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+(7 rows)
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p3
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p4
+ Filter: ((a > 1) AND (a = 10))
+(11 rows)
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef';
+ QUERY PLAN
+------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+ -> Seq Scan on rlp3efgh
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+(7 rows)
+
+drop table lp, rlp, mc3p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 2fd3f2b1b1..2eb81fcf41 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 76b0de30a7..6611662149 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..616ad95611
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,82 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* empty */
+explain (costs off) select * from rlp where a <= 31;
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, 0, 0) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, 0, 0);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+
+-- XXX - '<>' clauses cannot be handled yet
+explain (costs off) select * from lp where a <> 'a';
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef';
+
+drop table lp, rlp, mc3p;
--
2.11.0
On Fri, Sep 15, 2017 at 4:50 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Rebased patches attached. Because Dilip complained earlier today about
clauses of the form (const op var) not causing partition-pruning, I've
added code to commute the clause where it is required. Some other
previously mentioned limitations remain -- no handling of OR clauses, no
elimination of redundant clauses for given partitioning column, etc.A note about 0001: this patch overlaps with
0003-Canonical-partition-scheme.patch from the partitionwise-join patch
series that Ashutosh Bapat posted yesterday [1].
It doesn't merely overlap; it's obviously a derivative work, and the
commit message in your version should credit all the authors.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 16, 2017 at 4:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 15, 2017 at 4:50 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Rebased patches attached. Because Dilip complained earlier today about
clauses of the form (const op var) not causing partition-pruning, I've
added code to commute the clause where it is required. Some other
previously mentioned limitations remain -- no handling of OR clauses, no
elimination of redundant clauses for given partitioning column, etc.A note about 0001: this patch overlaps with
0003-Canonical-partition-scheme.patch from the partitionwise-join patch
series that Ashutosh Bapat posted yesterday [1].It doesn't merely overlap; it's obviously a derivative work,
Yes it is. I noted that upthread [1]/messages/by-id/0e829199-a43c-2a66-b966-89a0020a6cd4@lab.ntt.co.jp that most of these are derived
from Ashutosh's patch on his suggestion. I guess I should have
repeated that in this message too, sorry.
and the
commit message in your version should credit all the authors.
That was a mistake on my part, too. Will be careful hereon.
Thanks,
Amit
[1]: /messages/by-id/0e829199-a43c-2a66-b966-89a0020a6cd4@lab.ntt.co.jp
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 15, 2017 at 2:20 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/09/15 11:16, Amit Langote wrote:
Thanks for the updated patch. I was going through the logic of
get_rel_partitions in 0002 as almost similar functionality will be
required by runtime partition pruning on which Beena is working. The
only difference is that here we are processing the
"rel->baserestrictinfo" and in case of runtime pruning, we also need
to process join clauses which are pushed down to appendrel.
So can we make some generic logic which can be used for both the patches.
So basically, we need to do two changes
1. In get_rel_partitions instead of processing the
"rel->baserestrictinfo" we can take clause list as input that way we
can pass any clause list to this function.
2. Don't call "get_partitions_for_keys" routine from the
"get_rel_partitions", instead, get_rel_partitions can just prepare
minkey, maxkey and the caller of the get_rel_partitions can call
get_partitions_for_keys, because for runtime pruning we need to call
get_partitions_for_keys at runtime.
After these changes also there will be one problem that the
get_partitions_for_keys is directly fetching the "rightop->constvalue"
whereas, for runtime pruning, we need to store rightop itself and
calculate the value at runtime by param evaluation, I haven't yet
thought how can we make this last part generic.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I have done some refactoring of the code where I have moved the code
of getting the matching clause into the separate function so that it
can fetch the matching clause from any set of given restriction list.
It can be applied on top of 0002-WIP:
planner-side-changes-for-partition-pruning.patch
On Sat, Sep 16, 2017 at 3:13 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Sep 15, 2017 at 2:20 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/09/15 11:16, Amit Langote wrote:
Thanks for the updated patch. I was going through the logic of
get_rel_partitions in 0002 as almost similar functionality will be
required by runtime partition pruning on which Beena is working. The
only difference is that here we are processing the
"rel->baserestrictinfo" and in case of runtime pruning, we also need
to process join clauses which are pushed down to appendrel.So can we make some generic logic which can be used for both the patches.
So basically, we need to do two changes
1. In get_rel_partitions instead of processing the
"rel->baserestrictinfo" we can take clause list as input that way we
can pass any clause list to this function.2. Don't call "get_partitions_for_keys" routine from the
"get_rel_partitions", instead, get_rel_partitions can just prepare
minkey, maxkey and the caller of the get_rel_partitions can call
get_partitions_for_keys, because for runtime pruning we need to call
get_partitions_for_keys at runtime.After these changes also there will be one problem that the
get_partitions_for_keys is directly fetching the "rightop->constvalue"
whereas, for runtime pruning, we need to store rightop itself and
calculate the value at runtime by param evaluation, I haven't yet
thought how can we make this last part generic.--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0002-refactor_get_rel_partition.patchapplication/octet-stream; name=0002-refactor_get_rel_partition.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6e5efe9..2bb3641 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -846,50 +846,28 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
}
-/*
- * get_rel_partitions
- * Return the list of partitions of rel that pass the query clauses
- *
- * Returned list contains the AppendInfos of the chosen partitions.
- */
-static List *
-get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+static void
+get_matching_clause(RelOptInfo *rel, List *clauses, List **matchedclauses,
+ NullTestType *keynullness)
{
- Relation parent = heap_open(rte->relid, NoLock);
- PartitionDesc partdesc = RelationGetPartitionDesc(parent);
- List *indexes;
- List *result = NIL;
- ListCell *lc1,
- *lc2;
+ ListCell *lc;
int keyPos;
- List *matchedclauses[PARTITION_MAX_KEYS];
- NullTestType keynullness[PARTITION_MAX_KEYS];
- Datum minkeys[PARTITION_MAX_KEYS],
- maxkeys[PARTITION_MAX_KEYS];
- bool need_next_min,
- need_next_max,
- minkey_set[PARTITION_MAX_KEYS],
- maxkey_set[PARTITION_MAX_KEYS],
- min_incl,
- max_incl;
- int n_minkeys = 0,
- n_maxkeys = 0,
- i;
+ int i;
/*
* Match individual OpExprs in the query's restriction with individual
* partition key columns. There is one list per key.
*/
- memset(keynullness, -1, sizeof(keynullness));
- memset(matchedclauses, 0, sizeof(matchedclauses));
+ memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));
+ memset(matchedclauses, 0, PARTITION_MAX_KEYS * sizeof(List*));
keyPos = 0;
for (i = 0; i < rel->part_scheme->partnatts; i++)
{
Node *partkey = linitial(rel->partexprs[i]);
- foreach(lc2, rel->baserestrictinfo)
+ foreach(lc, clauses)
{
- RestrictInfo *rinfo = lfirst(lc2);
+ RestrictInfo *rinfo = lfirst(lc);
Expr *clause = rinfo->clause;
if (is_opclause(clause))
@@ -948,6 +926,37 @@ get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
/* Onto finding clauses matching the next partition key. */
keyPos++;
}
+}
+
+/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the query clauses
+ *
+ * Returned list contains the AppendInfos of the chosen partitions.
+ */
+static List *
+get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ List *indexes;
+ List *result = NIL;
+ ListCell *lc1;
+ List *matchedclauses[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ Datum minkeys[PARTITION_MAX_KEYS],
+ maxkeys[PARTITION_MAX_KEYS];
+ bool need_next_min,
+ need_next_max,
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_minkeys = 0,
+ n_maxkeys = 0,
+ i;
+
+ get_matching_clause(rel, rel->baserestrictinfo, matchedclauses,
+ keynullness);
/*
* Determine the min keys and the max keys using btree semantics-based
Hi Dilip.
Thanks for looking at the patches and the comments.
On 2017/09/16 18:43, Dilip Kumar wrote:
On Fri, Sep 15, 2017 at 2:20 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/09/15 11:16, Amit Langote wrote:
Thanks for the updated patch. I was going through the logic of
get_rel_partitions in 0002 as almost similar functionality will be
required by runtime partition pruning on which Beena is working. The
only difference is that here we are processing the
"rel->baserestrictinfo" and in case of runtime pruning, we also need
to process join clauses which are pushed down to appendrel.
Yeah, I agree with the point you seem to be making that
get_rel_partitions() covers a lot of functionality, which it would be nice
to break down into reusable function(s) with suitable signature(s) that
the executor will also be able to use.
Your proposed refactoring patch down-thread seems to be a good step in
that direction. Thanks for working on it.
So can we make some generic logic which can be used for both the patches.
So basically, we need to do two changes
1. In get_rel_partitions instead of processing the
"rel->baserestrictinfo" we can take clause list as input that way we
can pass any clause list to this function.2. Don't call "get_partitions_for_keys" routine from the
"get_rel_partitions", instead, get_rel_partitions can just prepare
minkey, maxkey and the caller of the get_rel_partitions can call
get_partitions_for_keys, because for runtime pruning we need to call
get_partitions_for_keys at runtime.
It's not clear to me whether get_rel_partitions() itself, as it is, is
callable from outside the planner, because its signature contains
RelOptInfo. We have the RelOptInfo in the signature, because we want to
mark certain fields in it so that latter planning steps can use them. So,
get_rel_partitions()'s job is not just to match clauses and find
partitions, but also to perform certain planner-specific tasks of
generating information that the later planning steps will want to use.
That may turn out to be unnecessary, but until we know that, let's not try
to export get_rel_partitions() itself out of the planner.
OTOH, the function that your refactoring patch separates out to match
clauses to partition keys and extract bounding values seems reusable
outside the planner and we should export it in such a way that it can be
used in the executor. Then, the hypothetical executor function that does
the pruning will first call the planner's clause-matching function,
followed by calling get_partitions_for_keys() in partition.c to get the
selected partitions.
We should be careful when designing the interface of the exported function
to make sure it's not bound to the planner. Your patch still maintains
the RelOptInfo in the signature of the clause-matching function, which the
executor pruning function won't have access to.
After these changes also there will be one problem that the
get_partitions_for_keys is directly fetching the "rightop->constvalue"
whereas, for runtime pruning, we need to store rightop itself and
calculate the value at runtime by param evaluation, I haven't yet
thought how can we make this last part generic.
I don't think any code introduced by the patch in partition.c itself looks
inside OpExpr (or any type of clause for that matter). That is, I don't
see where get_partitions_for_keys() is looking at rightop->constvalue.
All it receives to work with are arrays of Datums and some other relevant
information like inclusivity, nullness, etc.
By the way, I'm now rebasing these patches on top of [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826 and will try to
merge your refactoring patch in some appropriate way. Will post more
tomorrow.
Thanks,
Amit
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 25, 2017 at 3:34 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Thanks for looking at the patches and the comments.
It's not clear to me whether get_rel_partitions() itself, as it is, is
callable from outside the planner, because its signature contains
RelOptInfo. We have the RelOptInfo in the signature, because we want to
mark certain fields in it so that latter planning steps can use them. So,
get_rel_partitions()'s job is not just to match clauses and find
partitions, but also to perform certain planner-specific tasks of
generating information that the later planning steps will want to use.
That may turn out to be unnecessary, but until we know that, let's not try
to export get_rel_partitions() itself out of the planner.OTOH, the function that your refactoring patch separates out to match
clauses to partition keys and extract bounding values seems reusable
outside the planner and we should export it in such a way that it can be
used in the executor. Then, the hypothetical executor function that does
the pruning will first call the planner's clause-matching function,
followed by calling get_partitions_for_keys() in partition.c to get the
selected partitions.
Thanks for your reply.
Actually, we are still planning to call get_matching_clause at the
optimizer time only. Since we can not use get_rel_partitions function
directly for runtime pruning because it does all the work (find
matching clause, create minkey and maxkey and call
get_partitions_for_keys) during planning time itself.
For runtime pruning, we are planning to first get_matching_clause
function during optimizer time to identify the clause which is
matching with partition keys, but for PARAM_EXEC case we can not
depend upon baserelrestriction instead we will get the from join
clause, that's the reason I have separated out get_matching_clause.
But it will still be used during planning time.
After separating out the matching clause we will do somewhat similar
processing what "get_rel_partitions" is doing. But, at optimizer time
for PARAM we will not have Datum values for rightop, so we will keep
track of the PARAM itself.
And, finally at runtime when we get the PARAM value we can prepare
minkey and maxkey and call get_partitions_for_keys function.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/25 20:21, Dilip Kumar wrote:
On Mon, Sep 25, 2017 at 3:34 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Thanks for looking at the patches and the comments.
It's not clear to me whether get_rel_partitions() itself, as it is, is
callable from outside the planner, because its signature contains
RelOptInfo. We have the RelOptInfo in the signature, because we want to
mark certain fields in it so that latter planning steps can use them. So,
get_rel_partitions()'s job is not just to match clauses and find
partitions, but also to perform certain planner-specific tasks of
generating information that the later planning steps will want to use.
That may turn out to be unnecessary, but until we know that, let's not try
to export get_rel_partitions() itself out of the planner.OTOH, the function that your refactoring patch separates out to match
clauses to partition keys and extract bounding values seems reusable
outside the planner and we should export it in such a way that it can be
used in the executor. Then, the hypothetical executor function that does
the pruning will first call the planner's clause-matching function,
followed by calling get_partitions_for_keys() in partition.c to get the
selected partitions.Thanks for your reply.
Actually, we are still planning to call get_matching_clause at the
optimizer time only. Since we can not use get_rel_partitions function
directly for runtime pruning because it does all the work (find
matching clause, create minkey and maxkey and call
get_partitions_for_keys) during planning time itself.For runtime pruning, we are planning to first get_matching_clause
function during optimizer time to identify the clause which is
matching with partition keys, but for PARAM_EXEC case we can not
depend upon baserelrestriction instead we will get the from join
clause, that's the reason I have separated out get_matching_clause.
But it will still be used during planning time.
I see. So, in the run-time pruning case, only the work of extracting
bounding values is deferred to execution time. Matching clauses with the
partition key still occurs during planning time. Only that the clauses
that require run-time pruning are not those in rel->baserestrictinfo.
After separating out the matching clause we will do somewhat similar
processing what "get_rel_partitions" is doing. But, at optimizer time
for PARAM we will not have Datum values for rightop, so we will keep
track of the PARAM itself.
I guess information about which PARAMs map to which partition keys will be
kept in the plan somehow.
And, finally at runtime when we get the PARAM value we can prepare
minkey and maxkey and call get_partitions_for_keys function.
Note that get_partitions_for_keys() is not planner code, nor is it bound
with any other planning code. It's callable from executor without much
change. Maybe you already know that though.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 26, 2017 at 2:45 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/09/25 20:21, Dilip Kumar wrote:
I see. So, in the run-time pruning case, only the work of extracting
bounding values is deferred to execution time. Matching clauses with the
partition key still occurs during planning time. Only that the clauses
that require run-time pruning are not those in rel->baserestrictinfo.
Right.
After separating out the matching clause we will do somewhat similar
processing what "get_rel_partitions" is doing. But, at optimizer time
for PARAM we will not have Datum values for rightop, so we will keep
track of the PARAM itself.I guess information about which PARAMs map to which partition keys will be
kept in the plan somehow.
Yes.
And, finally at runtime when we get the PARAM value we can prepare
minkey and maxkey and call get_partitions_for_keys function.Note that get_partitions_for_keys() is not planner code, nor is it bound
with any other planning code. It's callable from executor without much
change. Maybe you already know that though.
Yes, Right.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Amit,
On 09/15/2017 04:50 AM, Amit Langote wrote:
On 2017/09/15 11:16, Amit Langote wrote:
I will post rebased patches later today, although I think the overall
design of the patch on the planner side of things is not quite there yet.
Of course, your and others' feedback is greatly welcome.Rebased patches attached. Because Dilip complained earlier today about
clauses of the form (const op var) not causing partition-pruning, I've
added code to commute the clause where it is required. Some other
previously mentioned limitations remain -- no handling of OR clauses, no
elimination of redundant clauses for given partitioning column, etc.A note about 0001: this patch overlaps with
0003-Canonical-partition-scheme.patch from the partitionwise-join patch
series that Ashutosh Bapat posted yesterday [1]. Because I implemented
the planner-portion of this patch based on what 0001 builds, I'm posting
it here. It might actually turn out that we will review and commit
0003-Canonical-partition-scheme.patch on that thread, but meanwhile apply
0001 if you want to play with the later patches. I would certainly like
to review 0003-Canonical-partition-scheme.patch myself, but won't be able
to immediately (see below).
Could you share your thoughts on the usage of PartitionAppendInfo's
min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.
I'm looking at get_partitions_for_keys.
Best regards,
Jesper
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 26, 2017 at 9:00 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
Could you share your thoughts on the usage of PartitionAppendInfo's
min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.
This brings up something that I've kind of been thinking about. There
are sort of four cases when it comes to partition pruning:
1. There is exactly one matching partition. For example, this happens
when there is an equality constraint on every partition column.
2. There are multiple matching partitions which are consecutive. For
example, there is a single level of range partitioning with no default
partition and the single partitioning column is constrained by < > <=
or >=.
3. There are multiple matching partitions which are not consecutive.
This case is probably rare, but it can happen if there is a default
partition, if there are list partitions with multiple bounds that are
interleaved (e.g. p1 allows (1, 4), p2 allows (2), p3 allows (3, 5),
and the query allows values >= 4 and <= 5), if the query involves OR
conditions, or if there are multiple levels of partitioning (e.g.
partition by a, subpartition by b, put a range constraint on a and an
equality constraint on b).
4. There are no matching partitions.
One of the goals of this algorithm is to be fast. The obvious way to
cater to case (3) is to iterate through all partitions and test
whether each one works, returning a Bitmapset, but that is O(n).
Admittedly, it might be O(n) with a pretty small constant factor, but
it still seems like exactly the sort of thing that we want to avoid
given the desire to scale to higher partition counts.
I propose that we create a structure that looks like this:
struct foo {
int min_partition;
int max_partition;
Bitmapset *extra_partitions;
};
This indicates that all partitions from min_partition to max_partition
need to be scanned, and in addition any partitions in extra_partitions
need to be scanned. Assuming that we only consider cases where all
partition keys or a leading subset of the partition keys are
constrained, we'll generally be able to get by with just setting
min_partition and max_partition, but extra_partitions can be used to
handle default partitions and interleaved list bounds. For equality
on all partitioning columns, we can do a single bsearch of the bounds
to identify the target partition at a given partitioning level, and
the same thing works for a single range-bound. If there are two
range-bounds (< and > or <= and >= or whatever) we need to bsearch
twice. The default partition, if any and if matched, must also be
included. When there are multiple levels of partitioning things get a
bit more complex -- if someone wants to knock out a partition that
breaks up the range, we might need to shrink the main range to cover
part of it and kick the other indexes out to extra_partitions.
But the good thing is that in common cases with only O(lg n) effort we
can return O(1) data that describes what will be scanned. In cases
where that's not practical we expend more effort but still prune with
maximal effectiveness.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/26/2017 10:33 AM, Robert Haas wrote:
On Tue, Sep 26, 2017 at 9:00 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:Could you share your thoughts on the usage of PartitionAppendInfo's
min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.This brings up something that I've kind of been thinking about. There
are sort of four cases when it comes to partition pruning:1. There is exactly one matching partition. For example, this happens
when there is an equality constraint on every partition column.2. There are multiple matching partitions which are consecutive. For
example, there is a single level of range partitioning with no default
partition and the single partitioning column is constrained by < > <=
or >=.3. There are multiple matching partitions which are not consecutive.
This case is probably rare, but it can happen if there is a default
partition, if there are list partitions with multiple bounds that are
interleaved (e.g. p1 allows (1, 4), p2 allows (2), p3 allows (3, 5),
and the query allows values >= 4 and <= 5), if the query involves OR
conditions, or if there are multiple levels of partitioning (e.g.
partition by a, subpartition by b, put a range constraint on a and an
equality constraint on b).4. There are no matching partitions.
One of the goals of this algorithm is to be fast. The obvious way to
cater to case (3) is to iterate through all partitions and test
whether each one works, returning a Bitmapset, but that is O(n).
Admittedly, it might be O(n) with a pretty small constant factor, but
it still seems like exactly the sort of thing that we want to avoid
given the desire to scale to higher partition counts.I propose that we create a structure that looks like this:
struct foo {
int min_partition;
int max_partition;
Bitmapset *extra_partitions;
};This indicates that all partitions from min_partition to max_partition
need to be scanned, and in addition any partitions in extra_partitions
need to be scanned. Assuming that we only consider cases where all
partition keys or a leading subset of the partition keys are
constrained, we'll generally be able to get by with just setting
min_partition and max_partition, but extra_partitions can be used to
handle default partitions and interleaved list bounds. For equality
on all partitioning columns, we can do a single bsearch of the bounds
to identify the target partition at a given partitioning level, and
the same thing works for a single range-bound. If there are two
range-bounds (< and > or <= and >= or whatever) we need to bsearch
twice. The default partition, if any and if matched, must also be
included. When there are multiple levels of partitioning things get a
bit more complex -- if someone wants to knock out a partition that
breaks up the range, we might need to shrink the main range to cover
part of it and kick the other indexes out to extra_partitions.But the good thing is that in common cases with only O(lg n) effort we
can return O(1) data that describes what will be scanned. In cases
where that's not practical we expend more effort but still prune with
maximal effectiveness.
For OLTP style applications 1) would be the common case, and with hash
partitions it would be one equality constraint.
So, changing the method signature to use a data type as you described
above instead of the explicit min_datum_idx / max_datum_idx output
parameters would be more clear.
One could advocate (*cough*) that the hash partition patch [1]https://commitfest.postgresql.org/14/1059/ should be
merged first in order to find other instances of where other CommitFest
entries doesn't account for hash partitions at the moment in their
method signatures; Beena noted something similar in [2]/messages/by-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com. I know that you
said otherwise [3]http://rhaas.blogspot.com/2017/08/plans-for-partitioning-in-v11.html, but this is CommitFest 1, so there is time for a
revert later, and hash partitions are already useful in internal testing.
[1]: https://commitfest.postgresql.org/14/1059/
[2]: /messages/by-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
/messages/by-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
[3]: http://rhaas.blogspot.com/2017/08/plans-for-partitioning-in-v11.html
Best regards,
Jesper
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 26, 2017 at 10:57 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
One could advocate (*cough*) that the hash partition patch [1] should be
merged first in order to find other instances of where other CommitFest
entries doesn't account for hash partitions at the moment in their method
signatures; Beena noted something similar in [2]. I know that you said
otherwise [3], but this is CommitFest 1, so there is time for a revert
later, and hash partitions are already useful in internal testing.
Well, that's a fair point. I was assuming that committing things in
that order would cause me to win the "least popular committer" award
at least for that day, but maybe not. It's certainly not ideal to
have to juggle that patch along and keep rebasing it over other
changes when it's basically done, and just waiting on other
improvements to land. Anybody else wish to express an opinion?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 25 September 2017 at 23:04, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
By the way, I'm now rebasing these patches on top of [1] and will try to
merge your refactoring patch in some appropriate way. Will post more
tomorrow.[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826
Yeah, I see 0001 conflicts with that. I'm going to set this to waiting
on author while you're busy rebasing this.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/27 1:51, Robert Haas wrote:
On Tue, Sep 26, 2017 at 10:57 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:One could advocate (*cough*) that the hash partition patch [1] should be
merged first in order to find other instances of where other CommitFest
entries doesn't account for hash partitions at the moment in their method
signatures; Beena noted something similar in [2]. I know that you said
otherwise [3], but this is CommitFest 1, so there is time for a revert
later, and hash partitions are already useful in internal testing.Well, that's a fair point. I was assuming that committing things in
that order would cause me to win the "least popular committer" award
at least for that day, but maybe not. It's certainly not ideal to
have to juggle that patch along and keep rebasing it over other
changes when it's basically done, and just waiting on other
improvements to land. Anybody else wish to express an opinion?
FWIW, I tend to agree that it would be nice to get the hash partitioning
patch in, even with old constraint exclusion based partition-pruning not
working for hash partitions. That way, it might be more clear what we
need to do in the partition-pruning patches to account for hash partitions.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi David,
On 2017/09/27 6:04, David Rowley wrote:
On 25 September 2017 at 23:04, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:By the way, I'm now rebasing these patches on top of [1] and will try to
merge your refactoring patch in some appropriate way. Will post more
tomorrow.[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826
Yeah, I see 0001 conflicts with that. I'm going to set this to waiting
on author while you're busy rebasing this.
Thanks for the reminder. Just thought I'd say that while I'm actually
done rebasing itself (attaching rebased patches to try 'em out), I'm now
considering Robert's comments and will be busy for a bit revising things
based on those comments.
Some notes about the attached patches:
- 0001 includes refactoring that Dilip proposed upthread [1]/messages/by-id/CAFiTN-tGnQzF_4QtbOHT-3hE=OvNaMfbbeRxa4UY0CQyF0G8gQ@mail.gmail.com (added him as
an author). I slightly tweaked his patch -- renamed the function
get_matching_clause to match_clauses_to_partkey, similar to
match_clauses_to_index.
- Code to set AppendPath's partitioned_rels in add_paths_to_append_rel()
revised by 0a480502b09 (was originally introduced in d3cc37f1d80) is
still revised to get partitioned_rels from a source that is not
PlannerInfo.pcinfo_list. With the new code, partitioned_rels won't
contain RT indexes of the partitioned child tables that were pruned.
Thanks,
Amit
[1]: /messages/by-id/CAFiTN-tGnQzF_4QtbOHT-3hE=OvNaMfbbeRxa4UY0CQyF0G8gQ@mail.gmail.com
/messages/by-id/CAFiTN-tGnQzF_4QtbOHT-3hE=OvNaMfbbeRxa4UY0CQyF0G8gQ@mail.gmail.com
Attachments:
0001-WIP-planner-side-changes-for-partition-pruning.patchtext/plain; charset=UTF-8; name=0001-WIP-planner-side-changes-for-partition-pruning.patchDownload
From f20aebcad9b089434ba60cd4439fa1a9d55091b8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 1/4] WIP: planner-side changes for partition-pruning
Firstly, this adds a stub get_partitions_for_keys() in partition.c
with appropriate interface for the caller to specify bounding scan
keys, along with other information about the scan keys extracted
from the query, such as NULL-ness of the keys, inclusive-ness, etc.
More importantly, this implements the planner-side logic to extract
bounding scan keys to be passed to get_partitions_for_keys. That is,
it will go through rel->baserestrictinfo and match individual clauses
to partition keys and construct lower bound and upper bound tuples,
which may cover only a prefix of a multi-column partition key.
A bunch of smarts are still missing when mapping the clause operands
with keys. For example, code to match a clause is specifed as
(constant op var) doesn't exist. Also, redundant keys are not
eliminated, for example, a combination of clauses a = 10 and a > 1
will cause the later clause a > 1 taking over and resulting in
needless scanning of partitions containing values a > 1 and a < 10.
...constraint exclusion is still used, because
get_partitions_for_keys is just a stub...
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 43 ++++
src/backend/optimizer/path/allpaths.c | 390 ++++++++++++++++++++++++++++++----
src/backend/optimizer/util/plancat.c | 10 +
src/backend/optimizer/util/relnode.c | 7 +
src/include/catalog/partition.h | 9 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/relation.h | 66 ++++++
7 files changed, 487 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1ab6dba7ae..ccf8a1fa67 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1335,6 +1335,49 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_for_keys
+ * Returns the list of indexes of rel's partitions that will need to be
+ * scanned given the bounding scan keys.
+ *
+ * Each value in the returned list can be used as an index into the oids array
+ * of the partition descriptor.
+ *
+ * Inputs:
+ * keynullness contains between 0 and (key->partnatts - 1) values, each
+ * telling what kind of NullTest has been applies to the corresponding
+ * partition key column. minkeys represents the lower bound on the partition
+ * the key of the records that the query will return, while maxkeys
+ * represents upper bound. min_inclusive and max_inclusive tell whether the
+ * bounds specified minkeys and maxkeys is inclusive, respectively.
+ *
+ * Other outputs:
+ * *min_datum_index will return the index in boundinfo->datums of the first
+ * datum that the query's bounding keys allow to be returned for the query.
+ * Similarly, *max_datum_index. *null_partition_chosen returns whether
+ * the null partition will be scanned.
+ *
+ * TODO: Implement.
+ */
+List *
+get_partitions_for_keys(Relation rel,
+ NullTestType *keynullness,
+ Datum *minkeys, int n_minkeys, bool min_inclusive,
+ Datum *maxkeys, int n_maxkeys, bool max_inclusive,
+ int *min_datum_index, int *max_datum_index,
+ bool *null_partition_chosen,
+ bool *default_partition_chosen)
+{
+ List *result = NIL;
+ int i;
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+
+ for (i = 0; i < partdesc->nparts; i++)
+ result = lappend_int(result, i);
+
+ return result;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a7866a99e0..47f88d1a8f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
@@ -135,6 +136,10 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static void match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ List **matchedclauses,
+ NullTestType *keynullness);
/*
@@ -846,6 +851,279 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * match_clauses_to_partkey
+ * Match the clauses in the list with partition's partition key
+ *
+ * Matched clauses are returned matchedclauses, which is an array with
+ * partnatts members, where each member is a list of clauses matched to the
+ * respective partition key. keynullness array also contains partnatts
+ * members where each member corresponds to the type of the NullTest
+ * encountered for a given partition key.
+ */
+void
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ List **matchedclauses,
+ NullTestType *keynullness)
+{
+ ListCell *lc;
+ int keyPos;
+ int i;
+
+ /*
+ * Match individual OpExprs in the query's restriction with individual
+ * partition key columns. There is one list per key.
+ */
+ memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));
+ memset(matchedclauses, 0, PARTITION_MAX_KEYS * sizeof(List*));
+ keyPos = 0;
+ for (i = 0; i < rel->part_scheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+
+ foreach(lc, clauses)
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+ Expr *clause = rinfo->clause;
+
+ if (is_opclause(clause))
+ {
+ Node *leftop = get_leftop(clause),
+ *rightop = get_rightop(clause);
+ Oid expr_op = ((OpExpr *) clause)->opno;
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = (Node *) ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ matchedclauses[keyPos] = lappend(matchedclauses[keyPos],
+ clause);
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[keyPos] = IS_NOT_NULL;
+ }
+ else if (equal(rightop, partkey))
+ {
+ Oid commutator = get_commutator(expr_op);
+
+ if (OidIsValid(commutator))
+ {
+ OpExpr *commutated_expr;
+
+ /*
+ * Generate a commutated copy of the expression, but
+ * try to make it look valid, because we only want
+ * it to put the constant operand in a place that the
+ * following code knows as the only place to find it.
+ */
+ commutated_expr = (OpExpr *) copyObject(clause);
+ commutated_expr->opno = commutator; /* really? */
+ commutated_expr->args = list_make2(rightop, leftop);
+ matchedclauses[keyPos] =
+ lappend(matchedclauses[keyPos],
+ commutated_expr);
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[keyPos] = IS_NOT_NULL;
+ }
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ keynullness[keyPos] = nulltest->nulltesttype;
+ }
+ }
+
+ /* Onto finding clauses matching the next partition key. */
+ keyPos++;
+ }
+}
+
+/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the query clauses
+ *
+ * Returned list contains the AppendInfos of the chosen partitions.
+ */
+static List *
+get_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ List *indexes;
+ List *result = NIL;
+ ListCell *lc1;
+ List *matchedclauses[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ Datum minkeys[PARTITION_MAX_KEYS],
+ maxkeys[PARTITION_MAX_KEYS];
+ bool need_next_min,
+ need_next_max,
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_minkeys = 0,
+ n_maxkeys = 0,
+ i;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys.
+ */
+ match_clauses_to_partkey(rel,
+ rel->baserestrictinfo,
+ matchedclauses,
+ keynullness);
+
+ /*
+ * Determine the min keys and the max keys using btree semantics-based
+ * interpretation of the clauses' operators.
+ */
+
+ /*
+ * XXX - There should be a step similar to _bt_preprocess_keys() here,
+ * to eliminate any redundant scan keys for a given partition column. For
+ * example, among a <= 4 and a <= 5, we can only keep a <= 4 for being
+ * more restrictive and discard a <= 5. While doing that, we can also
+ * check to see if there exists a contradictory combination of scan keys
+ * that makes the query trivially false for all records in the table.
+ */
+ memset(minkeys, 0, sizeof(minkeys));
+ memset(maxkeys, 0, sizeof(maxkeys));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < rel->part_scheme->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc1, matchedclauses[i])
+ {
+ Expr *clause = lfirst(lc1);
+ Const *rightop = (Const *) get_rightop(clause);
+ Oid opno = ((OpExpr *) clause)->opno,
+ opfamily = rel->part_scheme->partopfamily[i];
+ StrategyNumber strategy;
+
+ strategy = get_op_opfamily_strategy(opno, opfamily);
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+ }
+ if (strategy == BTLessStrategyNumber)
+ need_next_max = false;
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkeys[i] = rightop->constvalue;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+ }
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_min = false;
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkeys[i] = rightop->constvalue;
+ if (!minkey_set[i])
+ n_minkeys++;
+ }
+ minkey_set[i] = true;
+ min_incl = true;
+
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ }
+ maxkey_set[i] = true;
+ max_incl = true;
+ break;
+
+ /*
+ * This might mean '<>', but we don't have anything for that
+ * case yet. Perhaps, handle that as key < const OR
+ * key > const, once we have props needed for handling OR
+ * clauses.
+ */
+ default:
+ min_incl = max_incl = false;
+ break;
+ }
+ }
+ }
+
+ /* Ask partition.c which partitions it thinks match the keys. */
+ indexes = get_partitions_for_keys(parent, keynullness,
+ minkeys, n_minkeys, min_incl,
+ maxkeys, n_maxkeys, max_incl,
+ &rel->painfo->min_datum_idx,
+ &rel->painfo->max_datum_idx,
+ &rel->painfo->contains_null_partition,
+ &rel->painfo->contains_default_partition);
+
+ if (indexes != NIL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ int first_index,
+ last_index;
+ first_index = linitial_int(indexes);
+ last_index = llast_int(indexes);
+ Assert(first_index <= last_index ||
+ rel->part_scheme->strategy != PARTITION_STRATEGY_RANGE);
+#endif
+
+ foreach(lc1, indexes)
+ {
+ int partidx = lfirst_int(lc1);
+ AppendRelInfo *appinfo = rel->part_appinfos[partidx];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ Assert(partdesc->oids[partidx] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ }
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->painfo->live_partition_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1138,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1152,24 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_rel_partitions(root, rel, rte);
+ Assert(rel->painfo != NULL);
+ rel->painfo->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1190,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1203,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1118,6 +1411,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->painfo && rel->painfo)
+ {
+ rel->painfo->live_partitioned_rels =
+ list_concat(rel->painfo->live_partitioned_rels,
+ list_copy(childrel->painfo->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1213,14 +1517,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->painfo->live_partition_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1291,33 +1610,31 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
/*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append paths
- * will get flattened into the parent anyway. For a subquery RTE, no
- * PartitionedChildRelInfo exists; we collect all partitioned_rels
- * associated with any child. (This assumes that we don't need to look
- * through multiple levels of subquery RTEs; if we ever do, we could
- * create a PartitionedChildRelInfo with the accumulated list of
- * partitioned_rels which would then be found when populated our parent
- * rel with paths. For the present, that appears to be unnecessary.)
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
*/
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->painfo->live_partitioned_rels;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1330,17 +1647,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->painfo)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->painfo->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index cac46bedf9..49578c0684 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1833,6 +1833,16 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partdesc->boundinfo;
rel->nparts = partdesc->nparts;
rel->partexprs = build_baserel_partition_key_exprs(relation, rel->relid);
+
+ /*
+ * A PartitionAppendInfo to map this table to its immediate partitions
+ * that will be scanned by this query. At the same time, it records the
+ * table's partitioning properties reflecting any partition-pruning that
+ * might've occurred to satisfy the query. Rest of the fields are set in
+ * get_rel_partitions() and set_append_rel_size().
+ */
+ rel->painfo = makeNode(PartitionAppendInfo);
+ rel->painfo->boundinfo = partdesc->boundinfo;
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 077e89ae43..24c8442eae 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -17,6 +17,7 @@
#include <limits.h>
#include "miscadmin.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -151,6 +152,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->boundinfo = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
+ rel->painfo = NULL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -227,8 +229,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -252,6 +258,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 454a940a23..4f27d3018e 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -99,6 +99,7 @@ extern int get_partition_for_tuple(PartitionDispatch *pd,
EState *estate,
PartitionDispatchData **failed_at,
TupleTableSlot **failed_slot);
+
extern Oid get_default_oid_from_partdesc(PartitionDesc partdesc);
extern Oid get_default_partition_oid(Oid parentId);
extern void update_default_partition_oid(Oid parentId, Oid defaultPartId);
@@ -106,4 +107,12 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* Planner support stuff. */
+extern List *get_partitions_for_keys(Relation rel,
+ NullTestType *keynullness,
+ Datum *minkeys, int n_minkeys, bool min_inclusive,
+ Datum *maxkeys, int n_maxkeys, bool max_inclusive,
+ int *min_datum_index, int *max_datum_index,
+ bool *null_partition_chosen,
+ bool *default_partition_chosen);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 27bd4f3363..63196a1211 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
T_SpecialJoinInfo,
T_AppendRelInfo,
T_PartitionedChildRelInfo,
+ T_PartitionAppendInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 48e6012f7f..6dfa28bd1b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -524,6 +524,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs - Partition key expressions
*
@@ -561,6 +562,9 @@ typedef enum RelOptKind
/* Is the given relation an "other" relation? */
#define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
+typedef struct AppendRelInfo AppendRelInfo;
+typedef struct PartitionAppendInfo PartitionAppendInfo;
+
typedef struct RelOptInfo
{
NodeTag type;
@@ -643,9 +647,19 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Partition key expressions. */
+
+ /*
+ * For a partitioned relation, the following represents the identities
+ * of its live partitions (their appinfos) and some informations about
+ * the bounds that the live partitions satisfy.
+ */
+ PartitionAppendInfo *painfo;
} RelOptInfo;
/*
@@ -2085,6 +2099,58 @@ typedef struct PartitionedChildRelInfo
List *child_rels;
} PartitionedChildRelInfo;
+/* Forward declarations, to avoid including other headers */
+typedef struct PartitionDispatchData *PartitionDispatch;
+typedef struct PartitionBoundInfoData *PartitionBoundInfo;
+typedef struct PartitionKeyData *PartitionKey;
+
+/*
+ * PartitionAppendInfo - Properties of partitions contained in the Append path
+ * of a given partitioned table
+ */
+typedef struct PartitionAppendInfo
+{
+ NodeTag type;
+
+ /*
+ * List of AppendRelInfos of the table's partitions that satisfy a given
+ * query.
+ */
+ List *live_partition_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
+
+ /*
+ * The following simply copies the pointer to boundinfo in the table's
+ * PartitionDesc.
+ */
+ PartitionBoundInfo boundinfo;
+
+ /*
+ * Indexes in the boundinfo->datums array of the smallest and the largest
+ * value of the partition key that the query allows. They are set by
+ * calling get_partitions_for_keys().
+ */
+ int min_datum_idx;
+ int max_datum_idx;
+
+ /*
+ * Does this Append contain the null-accepting partition, if one exists
+ * and is allowed by the query's quals.
+ */
+ bool contains_null_partition;
+
+ /*
+ * Does this Append contain the default partition, if one exists and is
+ * allowed by the query's quals.
+ */
+ bool contains_default_partition;
+} PartitionAppendInfo;
+
/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
--
2.11.0
0002-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patchtext/plain; charset=UTF-8; name=0002-WIP-Interface-changes-for-partition_bound_-cmp-bsear.patchDownload
From 2830316f6d0f2e74a3e7960290df3a75aad10720 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 2/4] WIP: Interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index ccf8a1fa67..0133748234 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -111,6 +111,30 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the user-defined
+ * partition bound of a given existing partition, while an instance of the
+ * following struct describes either a new partition bound being compared
+ * against existing bounds (is_bound is true in that case and either lbound
+ * or rbound is set), or a new tuple's partition key specified in datums
+ * (ndatums = number of partition key columns).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -139,14 +163,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -755,10 +780,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -809,6 +840,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -830,8 +862,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -845,9 +880,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2380,12 +2415,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -2397,6 +2435,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -2427,12 +2466,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -2629,12 +2669,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2656,11 +2696,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2668,17 +2708,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2689,12 +2747,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2708,20 +2767,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2734,8 +2792,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0003-WIP-Implement-get_partitions_for_keys.patchtext/plain; charset=UTF-8; name=0003-WIP-Implement-get_partitions_for_keys.patchDownload
From b39fe19609c4cdec4961455540cef50beebce33a Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/4] WIP: Implement get_partitions_for_keys()
Disable constraint exclusion that occurs using internal partition
constraints, so that it's apparent what the new partition-pruning
code still needs to do to able to create a plan matching the plain
the the traditional constraint exclusion based partition-pruning
would result in.
---
src/backend/catalog/partition.c | 210 ++++++++++++++++++++++++++++++++++-
src/backend/optimizer/util/plancat.c | 4 +
2 files changed, 209 insertions(+), 5 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 0133748234..9d4b7c1a7f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1391,8 +1391,6 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
* datum that the query's bounding keys allow to be returned for the query.
* Similarly, *max_datum_index. *null_partition_chosen returns whether
* the null partition will be scanned.
- *
- * TODO: Implement.
*/
List *
get_partitions_for_keys(Relation rel,
@@ -1403,12 +1401,214 @@ get_partitions_for_keys(Relation rel,
bool *null_partition_chosen,
bool *default_partition_chosen)
{
+ int i,
+ minoff,
+ maxoff;
List *result = NIL;
- int i;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundCmpArg arg;
+ bool is_equal,
+ scan_default = false;
+ int null_partition_idx = partdesc->boundinfo->null_index;
- for (i = 0; i < partdesc->nparts; i++)
- result = lappend_int(result, i);
+ *null_partition_chosen = false;
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if partdesc->boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keynullness[i] == IS_NULL)
+ {
+ if (null_partition_idx >= 0)
+ {
+ *null_partition_chosen = true;
+ result = list_make1_int(null_partition_idx);
+ }
+ else
+ result = NIL;
+
+ return result;
+ }
+ }
+
+ /*
+ * If query provides no quals, don't forget to scan the default partition.
+ */
+ if (n_minkeys == 0 && n_maxkeys == 0)
+ scan_default = true;
+
+ if (n_minkeys > 0 && partdesc->nparts > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = minkeys;
+ arg.ndatums = n_minkeys;
+ minoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+ &arg, &is_equal);
+
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (min_inclusive)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 ||
+ minoff >= partdesc->boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, partdesc->boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found, but if the query may have asked us to exclude it.
+ */
+ if (is_equal && !min_inclusive)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Records returned by the query will be > bounds[minoff],
+ * because min_scankey is >= bounds[minoff], that is, no
+ * records of the partition at minoff will be returned. Go
+ * to the next bound.
+ */
+ if (minoff < partdesc->boundinfo->ndatums - 1)
+ minoff += 1;
+
+ /*
+ * Make sure to skip a gap.
+ * Note: There are ndatums + 1 lots in the indexes array.
+ */
+ if (partdesc->boundinfo->indexes[minoff] < 0 &&
+ partdesc->boundinfo->indexes[minoff + 1] >= 0)
+ minoff += 1;
+ break;
+ }
+ }
+ else
+ minoff = 0;
+
+ if (n_maxkeys > 0 && partdesc->nparts > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = maxkeys;
+ arg.ndatums = n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, partdesc->boundinfo,
+ &arg, &is_equal);
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (max_inclusive)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 ||
+ maxoff >= partdesc->boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, partdesc->boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (max_inclusive)
+ maxoff -= 1;
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found, but if the query may have asked us to exclude it.
+ */
+ if (is_equal && !max_inclusive)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Because bounds[maxoff] <= max_scankey, we may need to
+ * to consider the next partition as well, in addition to
+ * the partition at maxoff and earlier.
+ */
+ if (!is_equal || max_inclusive)
+ maxoff += 1;
+
+ /* Make sure to skip a gap. */
+ if (partdesc->boundinfo->indexes[maxoff] < 0 && maxoff >= 1)
+ maxoff -= 1;
+ break;
+ }
+ }
+ else
+ maxoff = partdesc->boundinfo->ndatums - 1;
+
+ *min_datum_index = minoff;
+ *max_datum_index = maxoff;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ for (i = minoff; i <= maxoff; i++)
+ {
+ int partition_idx = partdesc->boundinfo->indexes[i];
+
+ /*
+ * Multiple values may belong to the same partition, so make
+ * sure we don't add the same partition index again.
+ */
+ result = list_append_unique_int(result, partition_idx);
+ }
+
+ /* If no bounding keys exist, include the null partition too. */
+ if (null_partition_idx >= 0 &&
+ (keynullness[0] == -1 || keynullness[0] != IS_NOT_NULL))
+ {
+ *null_partition_chosen = true;
+ result = list_append_unique_int(result, null_partition_idx);
+ }
+
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ for (i = minoff; i <= maxoff; i++)
+ {
+ int partition_idx = partdesc->boundinfo->indexes[i];
+
+ /*
+ * If a valid partition exists for this range, add its
+ * index, if not, the default partition (if any) would be
+ * covering that range, so request to include the same.
+ */
+ if (partition_idx >= 0)
+ result = lappend_int(result, partition_idx);
+ else
+ scan_default = true;
+ }
+ break;
+ }
+
+ if (scan_default && partdesc->boundinfo->default_index >= 0)
+ result = lappend_int(result, partdesc->boundinfo->default_index);
return result;
}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 49578c0684..ef84dac7f2 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1160,7 +1160,9 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
List *pcqual;
+#endif
/*
* We assume the relation has already been safely locked.
@@ -1246,6 +1248,7 @@ get_relation_constraints(PlannerInfo *root,
}
}
+#ifdef USE_PARTITION_CONSTRAINT_FOR_PRUNING
/* Append partition predicates, if any */
pcqual = RelationGetPartitionQual(relation);
if (pcqual)
@@ -1263,6 +1266,7 @@ get_relation_constraints(PlannerInfo *root,
result = list_concat(result, pcqual);
}
+#endif
heap_close(relation, NoLock);
--
2.11.0
0004-Add-more-tests-for-the-new-partitioning-related-plan.patchtext/plain; charset=UTF-8; name=0004-Add-more-tests-for-the-new-partitioning-related-plan.patchDownload
From 5c0a9b4e102f4f6b44cd56a812f3d54855b45c6e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 4/4] Add more tests for the new partitioning-related planning
code
---
src/test/regress/expected/partition.out | 465 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 82 ++++++
4 files changed, 549 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..5242f4aa3d
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,465 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_null
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(3 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+(7 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+(5 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5
+ Filter: ((a)::numeric = '1'::numeric)
+(15 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+(5 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp4
+ Filter: (a > 10)
+ -> Seq Scan on rlp5
+ Filter: (a > 10)
+(11 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+(5 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+(11 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(7 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rlp5
+ Filter: (a > 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 30; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5
+ Filter: (a <= 31)
+(15 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(5 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(9 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+(15 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+(5 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+---------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+(7 rows)
+
+-- XXX - '<>' clauses cannot be handled yet
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+(7 rows)
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p3
+ Filter: ((a > 1) AND (a = 10))
+ -> Seq Scan on mc3p4
+ Filter: ((a > 1) AND (a = 10))
+(11 rows)
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef';
+ QUERY PLAN
+------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+ -> Seq Scan on rlp3efgh
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((b)::text = 'ab'::text) OR ((b)::text = 'ef'::text))
+(7 rows)
+
+drop table lp, rlp, mc3p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 2fd3f2b1b1..2eb81fcf41 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 76b0de30a7..6611662149 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..4b562a655b
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,82 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* empty */
+explain (costs off) select * from rlp where a <= 31;
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+
+-- XXX - '<>' clauses cannot be handled yet
+explain (costs off) select * from lp where a <> 'a';
+
+-- XXX - redundant clause elimination does not happen yet
+explain (costs off) select * from mc3p where a = 10 and a > 1;
+
+-- XXX - the OR clauses don't contribute to partition-pruning yet
+explain (costs off) select * from rlp3 where b = 'ab' or b = 'ef';
+
+drop table lp, rlp, mc3p;
--
2.11.0
Hi Jesper.
Firstly, thanks for looking at the patch.
On 2017/09/26 22:00, Jesper Pedersen wrote:
Hi Amit,
On 09/15/2017 04:50 AM, Amit Langote wrote:
On 2017/09/15 11:16, Amit Langote wrote:
I will post rebased patches later today, although I think the overall
design of the patch on the planner side of things is not quite there yet.
Of course, your and others' feedback is greatly welcome.Rebased patches attached. Because Dilip complained earlier today about
clauses of the form (const op var) not causing partition-pruning, I've
added code to commute the clause where it is required. Some other
previously mentioned limitations remain -- no handling of OR clauses, no
elimination of redundant clauses for given partitioning column, etc.A note about 0001: this patch overlaps with
0003-Canonical-partition-scheme.patch from the partitionwise-join patch
series that Ashutosh Bapat posted yesterday [1]. Because I implemented
the planner-portion of this patch based on what 0001 builds, I'm posting
it here. It might actually turn out that we will review and commit
0003-Canonical-partition-scheme.patch on that thread, but meanwhile apply
0001 if you want to play with the later patches. I would certainly like
to review 0003-Canonical-partition-scheme.patch myself, but won't be able
to immediately (see below).Could you share your thoughts on the usage of PartitionAppendInfo's
min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.I'm looking at get_partitions_for_keys.
Sure. You may have noticed that min_datum_idx and max_datum_idx in
PartitionAppendInfo are offsets in the PartitionDescData.boundinfo.datums
array, of datums that lie within the query-specified range (that is, using
=, >, >=, <, <= btree operators in the query). That array contains bounds
of all partitions sorted in the btree operator class defined order, at
least for list and range partitioning. I haven't (yet) closely looked at
the composition of hash partition datums in PartitionBoundInfo, which
perhaps have different ordering properties (or maybe none) than list and
range partitioning datums.
Now, since they are offsets of datums in PartitionBoundInfo, not indexes
of partitions themselves, their utility outside partition.c might be
questionable. But partition-wise join patch, for example, to determine if
two partitioned tables can be joined partition-wise, is going to check if
PartitionBoundInfos in RelOptInfos of two partitioned tables are
bound-to-bound equal [1]/messages/by-id/CAFjFpRc4UdCYknBai9pBu2GA1h4nZVNPDmzgs4jOkqFamT1huA@mail.gmail.com. Partition-pruning may select only a subset of
partitions of each of the joining partitioned tables. Equi-join
requirement for partition-wise join means that the subset of partitions
will be same on both sides of the join. My intent of having
min_datum_idx, max_datum_idx, along with contains_null_partition, and
contains_default_partition in PartitionAppendInfo is to have sort of a
cross check that those values end up being same on both sides of the join
after equi-join requirement has been satisfied. That is,
get_partitions_for_keys will have chosen the same set of partitions for
both partitioned tables and hence will have set the same values for those
fields.
As mentioned above, that may be enough for list and range partitioning,
but since hash partition datums do not appear to have the same properties
as list and range datums [2]It appears that in the hash partitioning case, unlike list and range partitioning, PartitionBoundInfo doesn't contain values that are directly comparable to query-specified constants, but a pair (modulus, remainder) for each partition. We'll first hash *all* the key values (mentioned in the query) using the partitioning hash machinery and then determine the hash partition index by using (hash % largest_modulus) as offset into the PartitionBoundInfo.indexes array., we may need an additional field(s) to
describe the hash partition selected by get_partitions_for_keys. I guess
only one field will be enough, that will be the offset in the datums array
of the hash partition chosen for the query or -1 if query quals couldn't
conclusively determine one (maybe not all partition keys were specified to
be hashed or some or all used non-equality operator).
Hope that answers your question at least to some degree. Now, there are
some points Robert mentioned in his reply that I will need to also
consider in the patch, which I'll go do now. :)
Thanks,
Amit
[1]: /messages/by-id/CAFjFpRc4UdCYknBai9pBu2GA1h4nZVNPDmzgs4jOkqFamT1huA@mail.gmail.com
/messages/by-id/CAFjFpRc4UdCYknBai9pBu2GA1h4nZVNPDmzgs4jOkqFamT1huA@mail.gmail.com
[2]: It appears that in the hash partitioning case, unlike list and range partitioning, PartitionBoundInfo doesn't contain values that are directly comparable to query-specified constants, but a pair (modulus, remainder) for each partition. We'll first hash *all* the key values (mentioned in the query) using the partitioning hash machinery and then determine the hash partition index by using (hash % largest_modulus) as offset into the PartitionBoundInfo.indexes array.
partitioning, PartitionBoundInfo doesn't contain values that are
directly comparable to query-specified constants, but a pair (modulus,
remainder) for each partition. We'll first hash *all* the key values
(mentioned in the query) using the partitioning hash machinery and
then determine the hash partition index by using
(hash % largest_modulus) as offset into the PartitionBoundInfo.indexes
array.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 27, 2017 at 6:09 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp
wrote:
On 2017/09/27 1:51, Robert Haas wrote:
On Tue, Sep 26, 2017 at 10:57 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:One could advocate (*cough*) that the hash partition patch [1] should be
merged first in order to find other instances of where other CommitFest
entries doesn't account for hash partitions at the moment in theirmethod
signatures; Beena noted something similar in [2]. I know that you said
otherwise [3], but this is CommitFest 1, so there is time for a revert
later, and hash partitions are already useful in internal testing.Well, that's a fair point. I was assuming that committing things in
that order would cause me to win the "least popular committer" award
at least for that day, but maybe not. It's certainly not ideal to
have to juggle that patch along and keep rebasing it over other
changes when it's basically done, and just waiting on other
improvements to land. Anybody else wish to express an opinion?FWIW, I tend to agree that it would be nice to get the hash partitioning
patch in, even with old constraint exclusion based partition-pruning not
working for hash partitions. That way, it might be more clear what we
need to do in the partition-pruning patches to account for hash partitions.
+1
regards,
Amul
On Wed, Sep 27, 2017 at 6:52 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I was looking into the latest patch set, seems like we can reuse some
more code between this path and runtime pruning[1]/messages/by-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
+ foreach(lc1, matchedclauses[i])
+ {
+ Expr *clause = lfirst(lc1);
+ Const *rightop = (Const *) get_rightop(clause);
+ Oid opno = ((OpExpr *) clause)->opno,
+ opfamily = rel->part_scheme->partopfamily[i];
+ StrategyNumber strategy;
+
+ strategy = get_op_opfamily_strategy(opno, opfamily);
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkeys[i] = rightop->constvalue;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+ }
+ if (strategy == BTLessStrategyNumber)
+ need_next_max = false;
I think the above logic is common between this patch and the runtime
pruning. I think we can make
a reusable function. Here we are preparing minkey and maxkey of Datum
because we can directly fetch rightop->constvalue whereas for runtime
pruning we are making minkeys and maxkeys of Expr because during
planning time we don't have the values for the Param. I think we can
always make these minkey, maxkey array of Expr and later those can be
processed in whatever way we want it. So this path will fetch the
constval out of Expr and runtime pruning will Eval that expression at
runtime.
Does this make sense or it will cause one level of extra processing
for this path i.e converting the Expr array to CONST array?
[1]: /messages/by-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/28 13:58, Dilip Kumar wrote:
On Wed, Sep 27, 2017 at 6:52 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I was looking into the latest patch set, seems like we can reuse some
more code between this path and runtime pruning[1]+ foreach(lc1, matchedclauses[i]) + { + Expr *clause = lfirst(lc1); + Const *rightop = (Const *) get_rightop(clause); + Oid opno = ((OpExpr *) clause)->opno, + opfamily = rel->part_scheme->partopfamily[i]; + StrategyNumber strategy; + + strategy = get_op_opfamily_strategy(opno, opfamily); + switch (strategy) + { + case BTLessStrategyNumber: + case BTLessEqualStrategyNumber: + if (need_next_max) + { + maxkeys[i] = rightop->constvalue; + if (!maxkey_set[i]) + n_maxkeys++; + maxkey_set[i] = true; + max_incl = (strategy == BTLessEqualStrategyNumber); + } + if (strategy == BTLessStrategyNumber) + need_next_max = false;I think the above logic is common between this patch and the runtime
pruning. I think we can make
a reusable function. Here we are preparing minkey and maxkey of Datum
because we can directly fetch rightop->constvalue whereas for runtime
pruning we are making minkeys and maxkeys of Expr because during
planning time we don't have the values for the Param. I think we can
always make these minkey, maxkey array of Expr and later those can be
processed in whatever way we want it. So this path will fetch the
constval out of Expr and runtime pruning will Eval that expression at
runtime.
I think that makes sense. In fact we could even move the minkey/maxkey
collection code to match_clauses_to_partkey() itself. No need for a
different function and worrying about defining a separate interface for
the same. We match clauses exactly because we want to extract the
constant or param values out of them. No need to do the two activities
independently and in different places.
Does this make sense or it will cause one level of extra processing
for this path i.e converting the Expr array to CONST array?
Hm, it's not such a big cost to pay I'd think.
I will update the planner patch accordingly.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 28, 2017 at 1:44 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/09/28 13:58, Dilip Kumar wrote:
I think the above logic is common between this patch and the runtime
pruning. I think we can make
a reusable function. Here we are preparing minkey and maxkey of Datum
because we can directly fetch rightop->constvalue whereas for runtime
pruning we are making minkeys and maxkeys of Expr because during
planning time we don't have the values for the Param. I think we can
always make these minkey, maxkey array of Expr and later those can be
processed in whatever way we want it. So this path will fetch the
constval out of Expr and runtime pruning will Eval that expression at
runtime.I think that makes sense. In fact we could even move the minkey/maxkey
collection code to match_clauses_to_partkey() itself. No need for a
different function and worrying about defining a separate interface for
the same. We match clauses exactly because we want to extract the
constant or param values out of them. No need to do the two activities
independently and in different places.
+1
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27 September 2017 at 14:22, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
- 0001 includes refactoring that Dilip proposed upthread [1] (added him as
an author). I slightly tweaked his patch -- renamed the function
get_matching_clause to match_clauses_to_partkey, similar to
match_clauses_to_index.
Hi Amit,
I've made a pass over the 0001 patch while trying to get myself up to
speed with the various developments that are going on in partitioning
right now.
These are just my thoughts from reading over the patch. I understand
that there's quite a bit up in the air right now about how all this is
going to work, but I have some thoughts about that too, which I
wouldn't mind some feedback on as my line of thought may be off.
Anyway, I will start with some small typos that I noticed while reading:
partition.c:
+ * telling what kind of NullTest has been applies to the corresponding
"applies" should be "applied"
plancat.c:
* might've occurred to satisfy the query. Rest of the fields are set in
"Rest of the" should be "The remaining"
Any onto more serious stuff:
allpath.c:
get_rel_partitions()
I wonder if this function does not belong in partition.c. I'd have
expected a function to exist per partition type, RANGE and LIST, I'd
imagine should have their own function in partition.c to eliminate
partitions
which cannot possibly match, and return the remainder. I see people
speaking of HASH partitioning, but we might one day also want
something like RANDOM or ROUNDROBIN (I'm not really kidding, imagine
routing records to be processed into foreign tables where you never
need to query them again). It would be good if we could easily expand
this list and not have to touch files all over the optimizer to do
that. Of course, there would be other work to do in the executor to
implement any new partitioning method too.
I know there's mention of it somewhere about get_rel_partitions() not
being so smart about WHERE partkey > 20 AND partkey > 10, the code
does not understand what's more restrictive. I think you could
probably follow the same rules here that are done in
eval_const_expressions(). Over there I see that evaluate_function()
will call anything that's not marked as volatile. I'd imagine, for
each partition key, you'd want to store a Datum with the minimum and
maximum possible value based on the quals processed. If either the
minimum or maximum is still set to NULL, then it's unbounded at that
end. If you encounter partkey = Const, then minimum and maximum can be
set to the value of that Const. From there you can likely ignore any
other quals for that partition key, as if the query did contain
another qual with partkey = SomeOtherConst, then that would have
become a gating qual during the constant folding process. This way if
the user had written WHERE partkey >= 1 AND partkey <= 1 the
evaluation would end up the same as if they'd written WHERE partkey =
1 as the minimum and maximum would be the same value in both cases,
and when those two values are the same then it would mean just one
theoretical binary search on a partition range to find the correct
partition instead of two.
I see in get_partitions_for_keys you've crafted the function to return
something to identify which partitions need to be scanned. I think it
would be nice to see a special element in the partition array for the
NULL and DEFAULT partition. I imagine 0 and 1, but obviously, these
would be defined constants somewhere. The signature of that function
is not so pretty and that would likely tidy it up a bit. The matching
partition indexes could be returned as a Bitmapset, yet, I don't
really see any code which handles adding the NULL and DEFAULT
partition in get_rel_partitions() either, maybe I've just not looked
hard enough yet...
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 28, 2017 at 5:16 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
I'd imagine, for
each partition key, you'd want to store a Datum with the minimum and
maximum possible value based on the quals processed. If either the
minimum or maximum is still set to NULL, then it's unbounded at that
end. If you encounter partkey = Const, then minimum and maximum can be
set to the value of that Const. From there you can likely ignore any
other quals for that partition key, as if the query did contain
another qual with partkey = SomeOtherConst, then that would have
become a gating qual during the constant folding process. This way if
the user had written WHERE partkey >= 1 AND partkey <= 1 the
evaluation would end up the same as if they'd written WHERE partkey =
1 as the minimum and maximum would be the same value in both cases,
and when those two values are the same then it would mean just one
theoretical binary search on a partition range to find the correct
partition instead of two.
I have not looked at the code submitted here in detail yet but I do
think we should try to avoid wasting cycles in the
presumably-quite-common case where equality is being tested. The
whole idea of thinking of this as minimum/maximum seems like it might
be off precisely for that reason.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/30 1:28, Robert Haas wrote:
On Thu, Sep 28, 2017 at 5:16 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:I'd imagine, for
each partition key, you'd want to store a Datum with the minimum and
maximum possible value based on the quals processed. If either the
minimum or maximum is still set to NULL, then it's unbounded at that
end. If you encounter partkey = Const, then minimum and maximum can be
set to the value of that Const. From there you can likely ignore any
other quals for that partition key, as if the query did contain
another qual with partkey = SomeOtherConst, then that would have
become a gating qual during the constant folding process. This way if
the user had written WHERE partkey >= 1 AND partkey <= 1 the
evaluation would end up the same as if they'd written WHERE partkey =
1 as the minimum and maximum would be the same value in both cases,
and when those two values are the same then it would mean just one
theoretical binary search on a partition range to find the correct
partition instead of two.I have not looked at the code submitted here in detail yet but I do
think we should try to avoid wasting cycles in the
presumably-quite-common case where equality is being tested. The
whole idea of thinking of this as minimum/maximum seems like it might
be off precisely for that reason.
I agree. Equality checks are going to be common enough to warrant them to
be handled specially, instead of implementing equality-pruning on top of
min/max framework.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Oct 1, 2017 at 9:13 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I agree. Equality checks are going to be common enough to warrant them to
be handled specially, instead of implementing equality-pruning on top of
min/max framework.
What you might do is pass <btree-strategy-number, bounds> and
optionally allow a second <btree-strategy-number, bounds>. Then for
the common case of equality you can pass BTEqualStrategyNumber and for
a range bounded at both ends you can pass BTGreaterStrategyNumber or
BTGreaterEqualStrategyNumber for one bound and BTLessStrategyNumber or
BTLessEqualStrategyNumber for the other.
Not sure if this is exactly the right idea but it's what pops to mind.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi David.
Thanks a lot for your review comments and sorry it took me a while to reply.
On 2017/09/28 18:16, David Rowley wrote:
On 27 September 2017 at 14:22, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:- 0001 includes refactoring that Dilip proposed upthread [1] (added him as
an author). I slightly tweaked his patch -- renamed the function
get_matching_clause to match_clauses_to_partkey, similar to
match_clauses_to_index.Hi Amit,
I've made a pass over the 0001 patch while trying to get myself up to
speed with the various developments that are going on in partitioning
right now.These are just my thoughts from reading over the patch. I understand
that there's quite a bit up in the air right now about how all this is
going to work, but I have some thoughts about that too, which I
wouldn't mind some feedback on as my line of thought may be off.Anyway, I will start with some small typos that I noticed while reading:
partition.c:
+ * telling what kind of NullTest has been applies to the corresponding
"applies" should be "applied"
Will fix.
plancat.c:
* might've occurred to satisfy the query. Rest of the fields are set in
"Rest of the" should be "The remaining"
Will fix.
Any onto more serious stuff:
allpath.c:
get_rel_partitions()
I wonder if this function does not belong in partition.c. I'd have
expected a function to exist per partition type, RANGE and LIST, I'd
imagine should have their own function in partition.c to eliminate
partitions
which cannot possibly match, and return the remainder. I see people
speaking of HASH partitioning, but we might one day also want
something like RANDOM or ROUNDROBIN (I'm not really kidding, imagine
routing records to be processed into foreign tables where you never
need to query them again). It would be good if we could easily expand
this list and not have to touch files all over the optimizer to do
that. Of course, there would be other work to do in the executor to
implement any new partitioning method too.
I think there will have to be at least some code in the optimizer. That
is, the code to match the query to the table's partition keys and using
the constant values therein to then look up the table's partitions. More
importantly, once the partitions (viz. their offsets in the table's
partition descriptor) have been looked up by partition.c, we must be able
to then map them to their planner data structures viz. their
AppendRelInfo, and subsequently RelOptInfo. This last part will have to
be in the optimizer. In fact, that was the role of get_rel_partitions in
the initial versions of the patch, when neither of the code for matching
keys and for pruning using constants was implemented.
One may argue that the first part, that is, matching clauses to match to
the partition key and subsequent lookup of partitions using constants
could both be implemented in partition.c, but it seems better to me to
keep at least the part of matching clauses to the partition keys within
the planner (just like matching clauses to indexes is done by the
planner). Looking up partitions using constants cannot be done outside
partition.c anyway, so that's something we have to implement there.
I know there's mention of it somewhere about get_rel_partitions() not
being so smart about WHERE partkey > 20 AND partkey > 10, the code
does not understand what's more restrictive. I think you could
probably follow the same rules here that are done in
eval_const_expressions(). Over there I see that evaluate_function()
will call anything that's not marked as volatile.
Hmm, unless I've missed it, I don't see in evaluate_function() anything
about determining which clause is more restrictive. AFAIK, such
determination depends on the btree operator class semantics (at least in
the most common cases where, say, ">" means greater than in a sense that
btree code uses it), so I was planning to handle it the way btree code
handles it in _bt_preprocess_keys(). In fact, the existing code in
predtest.c, which makes decisions of the similar vein also relies on btree
semantics. It's OK to do so, because partitioning methods that exist
today and for which we'd like to have such smarts use btree semantics to
partition the data. Also, we won't need to optimize such cases for hash
partitioning anyway.
I'd imagine, for
each partition key, you'd want to store a Datum with the minimum and
maximum possible value based on the quals processed. If either the
minimum or maximum is still set to NULL, then it's unbounded at that
end. If you encounter partkey = Const, then minimum and maximum can be
set to the value of that Const. From there you can likely ignore any
other quals for that partition key, as if the query did contain
another qual with partkey = SomeOtherConst, then that would have
become a gating qual during the constant folding process. This way if
the user had written WHERE partkey >= 1 AND partkey <= 1 the
evaluation would end up the same as if they'd written WHERE partkey =
1 as the minimum and maximum would be the same value in both cases,
and when those two values are the same then it would mean just one
theoretical binary search on a partition range to find the correct
partition instead of two.
Given the way the patch recognizes a given qual as contributing to the
equal key or minimum key or maximum key, it will not conclude the above to
in fact be an equal key, because that presumably would require comparing
clauses among each other to make such a discovery. I'm slightly hesitant
to add code to do that in the first version of the patch. That is, for
time being let WHERE partkey >= 1 and partkey <= 1 be handled by passing 1
as both minimum and maximum key, which requires two binary searches.
Whereas, WHERE partkey = 1 would require only one. Planner code to get
rid of the extra binary search lookup could come later, IMHO.
I see in get_partitions_for_keys you've crafted the function to return
something to identify which partitions need to be scanned. I think it
would be nice to see a special element in the partition array for the
NULL and DEFAULT partition. I imagine 0 and 1, but obviously, these
would be defined constants somewhere. The signature of that function
is not so pretty and that would likely tidy it up a bit. The matching
partition indexes could be returned as a Bitmapset, yet, I don't
really see any code which handles adding the NULL and DEFAULT
partition in get_rel_partitions() either, maybe I've just not looked
hard enough yet...
New version of the patch I will post soon cleans up the interface of
get_partitions_for_keys quite a bit; particularly the way selected
partitions are returned, for which I adopted an idea that Robert Haas
mentioned [2]/messages/by-id/CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com. When it recognizes that a sequence of consecutive
partitions are to be scanned, it will return the starting and ending
offsets as *min_part_idx and *max_part_idx. Those that don't fit this
pattern (of which there should be only a few in many cases) are returned
in a Bitmapset, as a supposedly unordered set of partitioned. Since NULL
and DEFAULT partitions are partitions of this later category, they would
be included the bitmapset if it turns out that the query will need to scan
them after all.
Thanks again.
Regards,
Amit
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf8269
[2]: /messages/by-id/CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com
/messages/by-id/CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/09/27 10:22, Amit Langote wrote:
Thanks for the reminder. Just thought I'd say that while I'm actually
done rebasing itself (attaching rebased patches to try 'em out), I'm now
considering Robert's comments and will be busy for a bit revising things
based on those comments.
And here is the updated, significantly re-designed patch set. I'm
dropping the WIP- from the patches' names and marking this patch set as
the v1 set (Jesper Pedersen pointed out to me offline that the patches
didn't have the vNumber before).
Significant points of revision:
* After thinking a bit more about the applicability/re-usability of the
code being discussed here in the run-time pruning case, I came to the
conclusion that my previous approach was a bit wrongheaded (as perhaps
also what David Rowley was thinking when he commented on some aspects of
the old patch's design at [1]/messages/by-id/CAKJS1f8Jzix8cs7tCDS_qNPd0CetHjB8x9fmLG4OTbCfthgo1w@mail.gmail.com, so thanks to him for prodding me in what I
ended up thinking to be a good direction after all).
With the previous approach, a bit too much work would be done by the
planner with no possibility of the code doing that work being useful in
the executor (interface involved passing RelOptInfo *). So, if some
optimization trick that would lead to better pruning decision depended on
the constant values in all the clauses being available, we'd have to skip
that optimization for clauses that would otherwise be chosen as run-time
pruning clauses, because by definition, they would not have constant
values available. In the new design, planner code limits itself to only
matching the clauses to partition key (checking things like whether the
operator of a clause matched to a partition column is compatible with
partitioning, etc.) and adding every matched clause to a list partclauses.
That should work unchanged for both the plan-time pruning case and the
run-time pruning case. We don't look at the supposedly-constant operands
of clauses in the aforementioned planner code at all.
Now if the clauses in partclauses are all known to contain the constant
operand values (the plan-time pruning case), it can immediately pass them
to partition.c to analyze those clauses, extract bounding keys in a form
suitable to do lookup in PartitionBoundInfo and prune partitions that
won't satisfy those bounding keys (and hence the clauses).
If partclauses contains clauses that don't have the constant operand (the
run-time pruning case), don't go to partition.c just yet, instead stuff
the list into the plan (Append) node and go to partition.c only when all
the constant values are available (the patch at [2]/messages/by-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com will implement that).
* Unlike the previous approach where partition.c would return a list of
integer indexes, where the individual indexes would be those of the bound
datums and not those of partitions themselves, in the new design, indexes
of the selected partitions (their offsets in the PartitionDesc.oids array)
are returned in a way that Robert suggested [3]/messages/by-id/CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com -- as a range of
contiguous partitions whenever possible (in the form of *min_part_idx and
*max_part_idx) and/or as a set of partitions appearing in no particular
order (in the form of a Bitmapset). Second form is required not only for
default/null-only/list partitions (which do not have any notion of
ordering among each other), but also when individual arms of an OR clause
select partitions scattered all over the place.
* In add_paths_to_append_rel(), the partitioned_rels list passed to copy
into the Append path is no longer same as the one found in
PlannerInfo.pcinfo_list, because the latter contains *all* partitioned
child relations in a partition tree. Instead, the patch teaches it to
only include *live* partitioned child relations.
* Since the partitionwise join code looks at *all* entries in
RelOptInfo.part_rels of both the joining partitioned relations, there is
some new code there to make dead partitions' RelOptInfos look valid with a
dummy path *after-the-fact*. That's because, set_append_rel_size() whose
job it is to initialize certain fields of child RelOptInfos will do it
only for *live* partitions with the new arrangement, where only live
partitions of a partitioned table are processed by the main loop of
set_append_rel_size().
Some notes about the regression tests:
Patch 0001 adds new tests for partition-pruning. Constraint exclusion
using internal partition constraints is not disabled in the code until the
last patch, which implements the last piece needed for the new partition
pruning to do any real work. With that patch, we see some differences in
the plan generated using the new partition-pruning code which appear in
the patch as the diffs to expected/inherit.out and expected/partition.out
(latter is the new output file added by 0001). I've almost convinced
myself that those diffs are simply artifacts of the difference in
implementation between constraint exclusion and the new partition-pruning
code and do not change the output that the plans produce. The difference
stems from that either the old or the new method, in some cases, fails to
prune away a partition that should have been. OTOH, in neither case do we
prune away a partition that shouldn't have been. :)
Description of the attached patches:
0001: add new tests for partition-pruning
0002: patch that makes all the changes needed in the planer (adds a stub
function in partition.c)
0003: patch that implements the aforementioned stub (significant amount of
code to analyze partition clauses and gin up bounding keys to
compare with the values in PartitionBoundInfo; the actual function
that will do the comparison is just a stub as of this patch)
0004: make some preparatory changes to partition_bound_cmp/bsearch, to be
able to pass incomplete partition keys (aka, prefix of a multi-
column key) for comparison with the values in PartitionBoundInfo
(just a refactoring patch)
0005: implements the stub mentioned in 0003 and finally gets the new
partition-pruning working (also disables constraint exclusion using
internal partition constraints by teaching get_relation_constraints
to not include those).
Feedback greatly welcome.
Thanks,
Amit
[1]: /messages/by-id/CAKJS1f8Jzix8cs7tCDS_qNPd0CetHjB8x9fmLG4OTbCfthgo1w@mail.gmail.com
/messages/by-id/CAKJS1f8Jzix8cs7tCDS_qNPd0CetHjB8x9fmLG4OTbCfthgo1w@mail.gmail.com
[2]: /messages/by-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
/messages/by-id/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
[3]: /messages/by-id/CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com
/messages/by-id/CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com
Attachments:
0003-Implement-get_partitions_from_clauses-v1.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v1.patchDownload
From a484fbf69a1debe97bbc3ef724ad858275a44688 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/5] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1034 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1031 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f8da91d0fe..abccb77393 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid repeated recomputation in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartitionScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartitionScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartitionScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ int rt_index, List *clauses);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static Datum partkey_datum_from_expr(const Expr *expr);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartitionScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1439,15 +1554,928 @@ get_partitions_from_clauses(Relation relation, int rt_index,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionSet *partset;
+
+ partset = get_partitions_from_clauses_guts(relation, rt_index,
+ partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_using_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionSet *partset;
+ PartitionScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset = partset_new(true, false);
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ /*
+ * If this orarg refutes the table's partition constraint (if the
+ * the table is a partition at all), don't go looking for its
+ * partitions, that is, leave the partition set we're building
+ * for this OR clause untouched.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partconstr = (List *) canonicalize_qual((Expr *) partconstr);
+ Assert(rt_index > 0);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+
+ /*
+ * NB: if the clause may contain Param, replace them with
+ * equivalent Vars before proceeding, because predtest.c does
+ * not know about Params.
+ */
+ if (predicate_refuted_by(partconstr,
+ list_make1(orarg), false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_guts(relation, 0,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+
+static Datum
+partkey_datum_from_expr(const Expr *expr)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ return ((Const *) expr)->constvalue;
+
+ default:
+ elog(ERROR, "invalid expression for partition key");
+ }
+
+ Assert(false); /* should never get here! */
+ return 0;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(a) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that informatin in the output argument
+ * *keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ memset(keynullness, 0, PARTITION_MAX_KEYS * sizeof(NullTestType *));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc = palloc0(sizeof(PartClause));
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ keynullness[i] = -1;
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Does leftop match with this partition key column? */
+ if ((IsA(leftop, Var) && partattno != 0 &&
+ ((Var *) leftop)->varattno == partattno) ||
+ equal(leftop, partexpr))
+ {
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * We would've accepted this saop only if its operator's
+ * negator was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ *or_clauses = lappend(*or_clauses, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is addmissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate PartClauseValSet */
+ for (i = 0; i < n_eqkeys; i++)
+ keys->eqkeys[i] = partkey_datum_from_expr(eqkey_exprs[i]);
+ keys->n_eqkeys = n_eqkeys;
+
+ for (i = 0; i < n_minkeys; i++)
+ keys->minkeys[i] = partkey_datum_from_expr(minkey_exprs[i]);
+ keys->n_minkeys = n_minkeys;
+ keys->min_incl = min_incl;
+
+ for (i = 0; i < n_maxkeys; i++)
+ keys->maxkeys[i] = partkey_datum_from_expr(maxkey_exprs[i]);
+ keys->n_maxkeys = n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+
+ return n_eqkeys + n_minkeys + n_maxkeys + n_keynullness;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ leftarg_const = partkey_datum_from_expr(leftarg->constarg);
+ rightarg_const = partkey_datum_from_expr(rightarg->constarg);
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions hat will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartitionScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v1.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v1.patchDownload
From 0248e01994a97a9845d60af4cc7bad7571f2370b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index abccb77393..857cd5f707 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3475,12 +3510,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3492,6 +3530,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3522,12 +3561,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3724,12 +3764,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3751,11 +3791,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3763,17 +3803,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3784,12 +3842,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3803,20 +3862,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3829,8 +3887,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Implement-get_partitions_for_keys-v1.patchtext/plain; charset=UTF-8; name=0005-Implement-get_partitions_for_keys-v1.patchDownload
From 7815b8538f85c257448709613d91d1be297e317a Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 5/5] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 365 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 64 ++----
3 files changed, 385 insertions(+), 48 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 857cd5f707..b7cca33f1f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2507,7 +2507,370 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal,
+ include_default = false;
+
+ /* Quick exit, if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * partset->empty will be set to true if we find out below that that's
+ * the case.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the two can hold nulls at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition, or there could be both. We already checked
+ * above if we should return the null-accepting partition, but being
+ * here means the query doesn't want nulls. All remaining data must
+ * be in the default partition if there is one.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ return partset;
+ }
+
+ /*
+ * keys->eqkeys identifies a unique partition only if values for all
+ * partition keys are provided. If only a prefix of all the partition
+ * keys is provided, then they identify a sequence of partitions each of
+ * whose upper bound allows the values in that prefix.
+ */
+ eqoff = -1;
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, key must exactly match datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * The bound at offset eqoff is <= eqkeys given how
+ * partition_bound_bsearch works, so the partition we're
+ * looking for is the one whose upper bound is at offset
+ * eqoff + 1.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /* No minkeys and maxkeys to look at in this case. */
+ minoff = maxoff = -1;
+ goto collect_partitions;
+ }
+
+ /*
+ * Using minkeys and maxkeys, identify the offsets of qualifying minimum
+ * and maximum bounds.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, minoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found a bound matching the query specified value, but this
+ * may be a range query and we may have been asked us to
+ * exclude the value itself. So, go to the next bound value.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Rows returned by the query will be > the bound value at
+ * minoff, because query's minkey is known to be >= that bound
+ * value given how partition_bound_bsearch works. IOW, no
+ * rows of the partition whose upper bound is the value at
+ * minoff will be returned, so go to the next bound value.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ }
+ }
+
+ /*
+ * Skip a gap, which might exist in the case of range partitioning, but
+ * also instruct the later code to consider default partition (if one
+ * exists) and could potentially contain data that satisfies the keys,
+ * which it only could if there are still still finite keys left on
+ * that side.
+ */
+ if (minoff >= 0 && boundinfo->indexes[minoff] < 0)
+ {
+ if (keys->n_minkeys > 0 &&
+ boundinfo->kind[minoff][keys->n_minkeys - 1] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ include_default = true;
+ minoff += 1;
+ }
+
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, maxoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found, but if the query may have asked us to exclude it.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Because the bound value at maxoff is known to be <= query's
+ * maxkey, we may need to consider the next partition as well.
+ * That would be in addition to the partition whose upper
+ * bound is the value at maxoff and earlier partitions.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Skip a gap, which might exist in the case of range partitioning, but
+ * also instruct the later code to consider default partition (if one
+ * exists) and could potentially contain data that satisfies the keys,
+ * which it only could if there are still still finite keys left on
+ * that side.
+ */
+ if (maxoff >= 0 && boundinfo->indexes[maxoff] < 0)
+ {
+ if (keys->n_maxkeys > 0 &&
+ boundinfo->kind[maxoff][keys->n_maxkeys-1] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ include_default = true;
+ maxoff -= 1;
+ }
+
+collect_partitions:
+ /*
+ * eqoff set to a valid bound offset means a unique partition has been
+ * identified. Otherwise, generate a sequence of partitions corresponding
+ * to bounds at offsets in range given by minoff and maxoff.
+ */
+ if (eqoff >= 0)
+ {
+ if (boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->indexes[eqoff]);
+ else
+ include_default = true;
+ }
+ else if (minoff >= 0 && maxoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* Add out-of-line partitions to the *other_parts set. */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ break;
+ }
+ }
+
+ /*
+ * Check if we need to add the null-accepting partition to the set, if
+ * it's not been already included by virtue of it being in the above
+ * range of partitions.
+ *
+ * There are couple of cases when it will need to be scanned:
+ *
+ * 1. If there are no quals, null partition trivially needs to be
+ * scanned.
+ *
+ * 2. If it's not already contained in *other_parts and the query does
+ * not prevent nulls in the result. If such a partition also accepts
+ * non-null datums, it would already be in *other_parts if one of those
+ * datums also qualify for the query.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ !bms_is_member(boundinfo->null_index, partset->other_parts) &&
+ keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0 &&
+ (keys->keynullness[0] == -1 || keys->keynullness[0] != IS_NOT_NULL))
+ {
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->null_index);
+ }
+
+ /*
+ * Check if we need to add the default partition to the set. There are
+ * couple of cases when it will need to be scanned:
+ *
+ * 1. If there are no quals, default partitions trivially needs to be
+ * scanned.
+ *
+ * 2. If there exist datums between those at offsets minoff and maxoff
+ * which don't have a partition that accepts them, then default partition
+ * would have caught them. With range partitioning, simply check if
+ * there is a -1 in boundinfo->indexes, which indicates unassigned
+ * portion of range. With list partitioning, if minoff != maxoff, it
+ * means there might be datums in that range that don't have a
+ * non-default partition assigned, whereas minoff == maxoff means only
+ * the partition containing that datum needs to be scanned.
+ * containing that datum needs to be scanned.
+ */
+ if (boundinfo->default_index >= 0)
+ {
+ if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ include_default = true;
+ else
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ include_default = (minoff != maxoff);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_default = true;
+ break;
+ }
+ }
+ break;
+ }
+ }
+
+ if (include_default)
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+
+ if (partset->min_part_idx < 0 && partset->max_part_idx < 0 &&
+ partset->other_parts == NULL)
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 61c4596bc7..69a7819171 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -198,16 +198,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
--------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -453,15 +451,13 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3nullxy
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(9 rows)
+(7 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -475,16 +471,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -502,9 +496,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -514,9 +506,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -530,9 +520,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -571,16 +559,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -592,9 +578,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -621,8 +605,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -630,9 +614,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -672,9 +654,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -712,9 +692,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -726,8 +704,6 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
drop table lp, coll_pruning, rlp, mc3p;
--
2.11.0
0001-Add-new-tests-for-partition-pruning-v1.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v1.patchDownload
From 6410811627a92e3544385dd226f637d1399a4ae3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/5] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 733 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 105 +++++
4 files changed, 840 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..61c4596bc7
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,733 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default
+ Filter: ((a)::numeric = '1'::numeric)
+(17 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 10)
+(7 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp4
+ Filter: (a > 10)
+ -> Seq Scan on rlp5
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default
+ Filter: (a > 10)
+(13 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default
+ Filter: (a < 15)
+(7 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 15)
+(13 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(7 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp5
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default
+ Filter: (a > 30)
+(5 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 31)
+(17 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(13 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a >= 15))
+(13 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(9 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+drop table lp, coll_pruning, rlp, mc3p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 53d4f49197..834f057d79 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index ed1df5ae24..f74615b0a2 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..d716fee69f
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,105 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+drop table lp, coll_pruning, rlp, mc3p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v1.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v1.patchDownload
From 5ba062828dfc83dff2a3521fce7532b2afc2c7b9 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/5] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning, although
as of this commit this contains *all* appinfos.
5. Code to handle the possibility that a partition RelOptInfo may
not have the basic information set (set_append_rel_size() does
that normally, but for partitioned tables, it will only do it
for the *live* partitions, but partitionwise-join code would
look at *all* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, do not call
get_partitions_from_clauses() right away. Instead, store the clauses
(somewhere, such as in the Append plan node) until such a time as
when all the "constant" values in them will be available. As of this
commit, we only pick up clauses from the baserestrictinfo list, so
it's safe to assume that each of the matched clause will provide the
constant value needed for pruning.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 24 ++
src/backend/optimizer/path/allpaths.c | 694 ++++++++++++++++++++++++++++------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 90 +++++
src/include/catalog/partition.h | 6 +
src/include/nodes/relation.h | 31 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 779 insertions(+), 118 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 07fdf66c38..f8da91d0fe 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,30 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 5535b63803..536ef22c58 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,15 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
+static BoolExpr *process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop);
/*
@@ -834,6 +845,17 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->rows = clamp_row_est(rel->rows);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
/*
* set_foreign_pathlist
* Build access paths for a foreign table RTE
@@ -846,6 +868,488 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &constfalse);
+
+ /*
+ * Since the clauses in rel->baserestrictinfo should all contain Const
+ * operands, it should be possible to prune partitions right away.
+ */
+ if (partclauses != NIL && !constfalse)
+ {
+ get_partitions_from_clauses(parent, rel->relid, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's recorded in the
+ * PartClauseValSet as well.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ bool constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &constfalse1) != NIL)
+ result = lappend(result, clause);
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and and that the latter is compatible with
+ * partitioning. If it is, we turn this into a OR BoolExpr:
+ * (key < val OR key > val), if the partitioning method
+ * supports such notion of inequlity.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+
+ if (partscheme->strategy == PARTITION_STRATEGY_RANGE ||
+ partscheme->strategy == PARTITION_STRATEGY_LIST)
+ {
+ BoolExpr *ne_or;
+
+ ne_or = process_partition_ne_op(rel, negator,
+ partopfamily,
+ partcoll,
+ (Expr *) leftop,
+ (Expr *) rightop);
+ result = lappend(result, ne_or);
+ }
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * Check if saop_op is compatible with partitioning. If so and
+ * if this saop is of type 'key op ANY (...)', convert this into
+ * a OR BoolExpr.
+ */
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle its negator is indeed a part of the partitioning
+ * operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ negated = true;
+ }
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ result = lappend(result, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ result = lappend(result, nulltest);
+ }
+ }
+ }
+
+ return result;
+}
+
+static BoolExpr *
+process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop)
+{
+ Expr *ltexpr,
+ *gtexpr;
+ Oid ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+
+ return (BoolExpr *) makeBoolExpr(OR_EXPR, list_make2(ltexpr, gtexpr), -1);
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1364,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1378,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1415,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1428,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,73 +1438,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
- */
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1152,6 +1608,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1247,14 +1714,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1325,43 +1807,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpcted rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1378,17 +1857,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ecdd7280eb..d9bbf20acb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6160,14 +6160,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..8e290e19b0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,82 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..e74a87035e 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -101,6 +101,7 @@ extern int get_partition_for_tuple(PartitionDispatch *pd,
EState *estate,
PartitionDispatchData **failed_at,
TupleTableSlot **failed_slot);
+
extern Oid get_default_oid_from_partdesc(PartitionDesc partdesc);
extern Oid get_default_partition_oid(Oid parentId);
extern void update_default_partition_oid(Oid parentId, Oid defaultPartId);
@@ -108,4 +109,9 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..e47f6e5cd3 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -575,6 +581,8 @@ typedef enum RelOptKind
((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL || \
(rel)->reloptkind == RELOPT_OTHER_JOINREL)
+typedef struct AppendRelInfo AppendRelInfo;
+
typedef struct RelOptInfo
{
NodeTag type;
@@ -657,10 +665,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
On Thu, Oct 19, 2017 at 12:16 PM, Amit Langote <
Langote_Amit_f8@lab.ntt.co.jp> wrote:
Description of the attached patches:
0001: add new tests for partition-pruning
0002: patch that makes all the changes needed in the planer (adds a stub
function in partition.c)0003: patch that implements the aforementioned stub (significant amount of
code to analyze partition clauses and gin up bounding keys to
compare with the values in PartitionBoundInfo; the actual function
that will do the comparison is just a stub as of this patch)0004: make some preparatory changes to partition_bound_cmp/bsearch, to be
able to pass incomplete partition keys (aka, prefix of a multi-
column key) for comparison with the values in PartitionBoundInfo
(just a refactoring patch)0005: implements the stub mentioned in 0003 and finally gets the new
partition-pruning working (also disables constraint exclusion using
internal partition constraints by teaching get_relation_constraints
to not include those).Feedback greatly welcome.
Hi Amit,
I have tried to apply attached patch. patch applied cleanly on commit id -
bf54c0f05c0a58db17627724a83e1b6d4ec2712c
but make failed with below error.
./../../../src/include/nodes/relation.h:2126: error: redefinition of
typedef ‘AppendRelInfo’
../../../../src/include/nodes/relation.h:584: note: previous declaration of
‘AppendRelInfo’ was here
make[4]: *** [gistbuild.o] Error 1
Thanks Rajkumar.
On 2017/10/23 15:58, Rajkumar Raghuwanshi wrote:
I have tried to apply attached patch. patch applied cleanly on commit id -
bf54c0f05c0a58db17627724a83e1b6d4ec2712c
but make failed with below error../../../../src/include/nodes/relation.h:2126: error: redefinition of
typedef ‘AppendRelInfo’
../../../../src/include/nodes/relation.h:584: note: previous declaration of
‘AppendRelInfo’ was here
make[4]: *** [gistbuild.o] Error 1
The compiler I have here (gcc (GCC) 6.2.0) didn't complain like that for
this typedef redefinition introduced by the 0002 patch, but it seems that
it's not needed anyway, so got rid of that line in the attached updated patch.
Fixed one more useless diff in 0002, but no changes in any other patch.
Thanks,
Amit
Attachments:
0001-Add-new-tests-for-partition-pruning-v2.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v2.patchDownload
From 17fedc446f25a397d86d8e65ad9e6478f6252cd4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/5] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 733 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 105 +++++
4 files changed, 840 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..61c4596bc7
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,733 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default
+ Filter: ((a)::numeric = '1'::numeric)
+(17 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 10)
+(7 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp4
+ Filter: (a > 10)
+ -> Seq Scan on rlp5
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default
+ Filter: (a > 10)
+(13 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default
+ Filter: (a < 15)
+(7 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 15)
+(13 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(7 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp5
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default
+ Filter: (a > 30)
+(5 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 31)
+(17 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(13 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a >= 15))
+(13 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(9 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+drop table lp, coll_pruning, rlp, mc3p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..d716fee69f
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,105 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+drop table lp, coll_pruning, rlp, mc3p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v2.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v2.patchDownload
From 6b8c2a466faae16578bd1727481576a6b727a905 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/5] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning, although
as of this commit this contains *all* appinfos.
5. Code to handle the possibility that a partition RelOptInfo may
not have the basic information set (set_append_rel_size() does
that normally, but for partitioned tables, it will only do it
for the *live* partitions, but partitionwise-join code would
look at *all* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, do not call
get_partitions_from_clauses() right away. Instead, store the clauses
(somewhere, such as in the Append plan node) until such a time as
when all the "constant" values in them will be available. As of this
commit, we only pick up clauses from the baserestrictinfo list, so
it's safe to assume that each of the matched clause will provide the
constant value needed for pruning.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 24 ++
src/backend/optimizer/path/allpaths.c | 694 ++++++++++++++++++++++++++++------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 90 +++++
src/include/catalog/partition.h | 5 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 776 insertions(+), 118 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 07fdf66c38..f8da91d0fe 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,30 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4e565b3c00..862309263d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,15 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
+static BoolExpr *process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop);
/*
@@ -834,6 +845,17 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->rows = clamp_row_est(rel->rows);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
/*
* set_foreign_pathlist
* Build access paths for a foreign table RTE
@@ -846,6 +868,488 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &constfalse);
+
+ /*
+ * Since the clauses in rel->baserestrictinfo should all contain Const
+ * operands, it should be possible to prune partitions right away.
+ */
+ if (partclauses != NIL && !constfalse)
+ {
+ get_partitions_from_clauses(parent, rel->relid, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's recorded in the
+ * PartClauseValSet as well.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ bool constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &constfalse1) != NIL)
+ result = lappend(result, clause);
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and and that the latter is compatible with
+ * partitioning. If it is, we turn this into a OR BoolExpr:
+ * (key < val OR key > val), if the partitioning method
+ * supports such notion of inequlity.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+
+ if (partscheme->strategy == PARTITION_STRATEGY_RANGE ||
+ partscheme->strategy == PARTITION_STRATEGY_LIST)
+ {
+ BoolExpr *ne_or;
+
+ ne_or = process_partition_ne_op(rel, negator,
+ partopfamily,
+ partcoll,
+ (Expr *) leftop,
+ (Expr *) rightop);
+ result = lappend(result, ne_or);
+ }
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * Check if saop_op is compatible with partitioning. If so and
+ * if this saop is of type 'key op ANY (...)', convert this into
+ * a OR BoolExpr.
+ */
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle its negator is indeed a part of the partitioning
+ * operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ negated = true;
+ }
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ result = lappend(result, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ result = lappend(result, nulltest);
+ }
+ }
+ }
+
+ return result;
+}
+
+static BoolExpr *
+process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop)
+{
+ Expr *ltexpr,
+ *gtexpr;
+ Oid ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+
+ return (BoolExpr *) makeBoolExpr(OR_EXPR, list_make2(ltexpr, gtexpr), -1);
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1364,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1378,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1415,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1428,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,73 +1438,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
- */
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1152,6 +1608,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1247,14 +1714,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1325,43 +1807,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1378,17 +1857,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ecdd7280eb..d9bbf20acb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6160,14 +6160,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..8e290e19b0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,82 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..5f55550952 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,9 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0003-Implement-get_partitions_from_clauses-v2.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v2.patchDownload
From 44db5602dde3508c0037e3b026c3477387b90f07 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/5] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1034 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1031 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f8da91d0fe..abccb77393 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid repeated recomputation in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartitionScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartitionScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartitionScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ int rt_index, List *clauses);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static Datum partkey_datum_from_expr(const Expr *expr);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartitionScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1439,15 +1554,928 @@ get_partitions_from_clauses(Relation relation, int rt_index,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionSet *partset;
+
+ partset = get_partitions_from_clauses_guts(relation, rt_index,
+ partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_using_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionSet *partset;
+ PartitionScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset = partset_new(true, false);
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ /*
+ * If this orarg refutes the table's partition constraint (if the
+ * the table is a partition at all), don't go looking for its
+ * partitions, that is, leave the partition set we're building
+ * for this OR clause untouched.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partconstr = (List *) canonicalize_qual((Expr *) partconstr);
+ Assert(rt_index > 0);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+
+ /*
+ * NB: if the clause may contain Param, replace them with
+ * equivalent Vars before proceeding, because predtest.c does
+ * not know about Params.
+ */
+ if (predicate_refuted_by(partconstr,
+ list_make1(orarg), false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_guts(relation, 0,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+
+static Datum
+partkey_datum_from_expr(const Expr *expr)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ return ((Const *) expr)->constvalue;
+
+ default:
+ elog(ERROR, "invalid expression for partition key");
+ }
+
+ Assert(false); /* should never get here! */
+ return 0;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(a) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that informatin in the output argument
+ * *keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ memset(keynullness, 0, PARTITION_MAX_KEYS * sizeof(NullTestType *));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc = palloc0(sizeof(PartClause));
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ keynullness[i] = -1;
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Does leftop match with this partition key column? */
+ if ((IsA(leftop, Var) && partattno != 0 &&
+ ((Var *) leftop)->varattno == partattno) ||
+ equal(leftop, partexpr))
+ {
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * We would've accepted this saop only if its operator's
+ * negator was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ *or_clauses = lappend(*or_clauses, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is addmissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate PartClauseValSet */
+ for (i = 0; i < n_eqkeys; i++)
+ keys->eqkeys[i] = partkey_datum_from_expr(eqkey_exprs[i]);
+ keys->n_eqkeys = n_eqkeys;
+
+ for (i = 0; i < n_minkeys; i++)
+ keys->minkeys[i] = partkey_datum_from_expr(minkey_exprs[i]);
+ keys->n_minkeys = n_minkeys;
+ keys->min_incl = min_incl;
+
+ for (i = 0; i < n_maxkeys; i++)
+ keys->maxkeys[i] = partkey_datum_from_expr(maxkey_exprs[i]);
+ keys->n_maxkeys = n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+
+ return n_eqkeys + n_minkeys + n_maxkeys + n_keynullness;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ leftarg_const = partkey_datum_from_expr(leftarg->constarg);
+ rightarg_const = partkey_datum_from_expr(rightarg->constarg);
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions hat will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartitionScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v2.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v2.patchDownload
From 7ffaa0cbc1e63aa2b18350ee78f0b56bc94e4263 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index abccb77393..857cd5f707 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3475,12 +3510,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3492,6 +3530,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3522,12 +3561,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3724,12 +3764,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3751,11 +3791,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3763,17 +3803,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3784,12 +3842,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3803,20 +3862,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3829,8 +3887,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Implement-get_partitions_for_keys-v2.patchtext/plain; charset=UTF-8; name=0005-Implement-get_partitions_for_keys-v2.patchDownload
From c9a097462bc99b2450bbb9886fadbbf47555468f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 5/5] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 365 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 64 ++----
3 files changed, 385 insertions(+), 48 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 857cd5f707..b7cca33f1f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2507,7 +2507,370 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal,
+ include_default = false;
+
+ /* Quick exit, if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * partset->empty will be set to true if we find out below that that's
+ * the case.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the two can hold nulls at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition, or there could be both. We already checked
+ * above if we should return the null-accepting partition, but being
+ * here means the query doesn't want nulls. All remaining data must
+ * be in the default partition if there is one.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ return partset;
+ }
+
+ /*
+ * keys->eqkeys identifies a unique partition only if values for all
+ * partition keys are provided. If only a prefix of all the partition
+ * keys is provided, then they identify a sequence of partitions each of
+ * whose upper bound allows the values in that prefix.
+ */
+ eqoff = -1;
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, key must exactly match datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * The bound at offset eqoff is <= eqkeys given how
+ * partition_bound_bsearch works, so the partition we're
+ * looking for is the one whose upper bound is at offset
+ * eqoff + 1.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /* No minkeys and maxkeys to look at in this case. */
+ minoff = maxoff = -1;
+ goto collect_partitions;
+ }
+
+ /*
+ * Using minkeys and maxkeys, identify the offsets of qualifying minimum
+ * and maximum bounds.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, minoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found a bound matching the query specified value, but this
+ * may be a range query and we may have been asked us to
+ * exclude the value itself. So, go to the next bound value.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Rows returned by the query will be > the bound value at
+ * minoff, because query's minkey is known to be >= that bound
+ * value given how partition_bound_bsearch works. IOW, no
+ * rows of the partition whose upper bound is the value at
+ * minoff will be returned, so go to the next bound value.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ }
+ }
+
+ /*
+ * Skip a gap, which might exist in the case of range partitioning, but
+ * also instruct the later code to consider default partition (if one
+ * exists) and could potentially contain data that satisfies the keys,
+ * which it only could if there are still still finite keys left on
+ * that side.
+ */
+ if (minoff >= 0 && boundinfo->indexes[minoff] < 0)
+ {
+ if (keys->n_minkeys > 0 &&
+ boundinfo->kind[minoff][keys->n_minkeys - 1] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ include_default = true;
+ minoff += 1;
+ }
+
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, maxoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /* Interpret the result per partition strategy. */
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Found, but if the query may have asked us to exclude it.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * Because the bound value at maxoff is known to be <= query's
+ * maxkey, we may need to consider the next partition as well.
+ * That would be in addition to the partition whose upper
+ * bound is the value at maxoff and earlier partitions.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Skip a gap, which might exist in the case of range partitioning, but
+ * also instruct the later code to consider default partition (if one
+ * exists) and could potentially contain data that satisfies the keys,
+ * which it only could if there are still still finite keys left on
+ * that side.
+ */
+ if (maxoff >= 0 && boundinfo->indexes[maxoff] < 0)
+ {
+ if (keys->n_maxkeys > 0 &&
+ boundinfo->kind[maxoff][keys->n_maxkeys-1] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ include_default = true;
+ maxoff -= 1;
+ }
+
+collect_partitions:
+ /*
+ * eqoff set to a valid bound offset means a unique partition has been
+ * identified. Otherwise, generate a sequence of partitions corresponding
+ * to bounds at offsets in range given by minoff and maxoff.
+ */
+ if (eqoff >= 0)
+ {
+ if (boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->indexes[eqoff]);
+ else
+ include_default = true;
+ }
+ else if (minoff >= 0 && maxoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* Add out-of-line partitions to the *other_parts set. */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ break;
+ }
+ }
+
+ /*
+ * Check if we need to add the null-accepting partition to the set, if
+ * it's not been already included by virtue of it being in the above
+ * range of partitions.
+ *
+ * There are couple of cases when it will need to be scanned:
+ *
+ * 1. If there are no quals, null partition trivially needs to be
+ * scanned.
+ *
+ * 2. If it's not already contained in *other_parts and the query does
+ * not prevent nulls in the result. If such a partition also accepts
+ * non-null datums, it would already be in *other_parts if one of those
+ * datums also qualify for the query.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ !bms_is_member(boundinfo->null_index, partset->other_parts) &&
+ keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0 &&
+ (keys->keynullness[0] == -1 || keys->keynullness[0] != IS_NOT_NULL))
+ {
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->null_index);
+ }
+
+ /*
+ * Check if we need to add the default partition to the set. There are
+ * couple of cases when it will need to be scanned:
+ *
+ * 1. If there are no quals, default partitions trivially needs to be
+ * scanned.
+ *
+ * 2. If there exist datums between those at offsets minoff and maxoff
+ * which don't have a partition that accepts them, then default partition
+ * would have caught them. With range partitioning, simply check if
+ * there is a -1 in boundinfo->indexes, which indicates unassigned
+ * portion of range. With list partitioning, if minoff != maxoff, it
+ * means there might be datums in that range that don't have a
+ * non-default partition assigned, whereas minoff == maxoff means only
+ * the partition containing that datum needs to be scanned.
+ * containing that datum needs to be scanned.
+ */
+ if (boundinfo->default_index >= 0)
+ {
+ if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ include_default = true;
+ else
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ include_default = (minoff != maxoff);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_default = true;
+ break;
+ }
+ }
+ break;
+ }
+ }
+
+ if (include_default)
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+
+ if (partset->min_part_idx < 0 && partset->max_part_idx < 0 &&
+ partset->other_parts == NULL)
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 61c4596bc7..69a7819171 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -198,16 +198,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
--------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -453,15 +451,13 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3nullxy
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(9 rows)
+(7 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -475,16 +471,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -502,9 +496,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -514,9 +506,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -530,9 +520,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -571,16 +559,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -592,9 +578,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -621,8 +605,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -630,9 +614,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -672,9 +654,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -712,9 +692,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -726,8 +704,6 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
drop table lp, coll_pruning, rlp, mc3p;
--
2.11.0
On Mon, Oct 23, 2017 at 1:12 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp
wrote:
The compiler I have here (gcc (GCC) 6.2.0) didn't complain like that for
this typedef redefinition introduced by the 0002 patch, but it seems that
it's not needed anyway, so got rid of that line in the attached updated
patch.Fixed one more useless diff in 0002, but no changes in any other patch
Thanks for updated patches, I am able to compile it on head.
While testing this, I got an observation, pruning is not scanning default
partition leading to wrong output. below is test to reproduce this.
create table rp (a int, b varchar) partition by range (a);
create table rp_p1 partition of rp default;
create table rp_p2 partition of rp for values from (1) to (10);
create table rp_p3 partition of rp for values from (10) to (maxvalue);
insert into rp values (-1,'p1');
insert into rp values (1,'p2');
insert into rp values (11,'p3');
postgres=# select tableoid::regclass,* from rp;
tableoid | a | b
----------+----+----
rp_p2 | 1 | p2
rp_p3 | 11 | p3
rp_p1 | -1 | p1
(3 rows)
--with pruning
postgres=# explain (costs off) select * from rp where a <= 1;
QUERY PLAN
--------------------------
Append
-> Seq Scan on rp_p2
Filter: (a <= 1)
(3 rows)
postgres=# select * from rp where a <= 1;
a | b
---+----
1 | p2
(1 row)
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
On Mon, Oct 23, 2017 at 3:24 PM, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
On Mon, Oct 23, 2017 at 1:12 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:The compiler I have here (gcc (GCC) 6.2.0) didn't complain like that for
this typedef redefinition introduced by the 0002 patch, but it seems that
it's not needed anyway, so got rid of that line in the attached updated
patch.Fixed one more useless diff in 0002, but no changes in any other patch
Thanks for updated patches, I am able to compile it on head.
While testing this, I got an observation, pruning is not scanning default
partition leading to wrong output. below is test to reproduce this.create table rp (a int, b varchar) partition by range (a);
create table rp_p1 partition of rp default;
create table rp_p2 partition of rp for values from (1) to (10);
create table rp_p3 partition of rp for values from (10) to (maxvalue);insert into rp values (-1,'p1');
insert into rp values (1,'p2');
insert into rp values (11,'p3');postgres=# select tableoid::regclass,* from rp;
tableoid | a | b
----------+----+----
rp_p2 | 1 | p2
rp_p3 | 11 | p3
rp_p1 | -1 | p1
(3 rows)--with pruning
postgres=# explain (costs off) select * from rp where a <= 1;
QUERY PLAN
--------------------------
Append
-> Seq Scan on rp_p2
Filter: (a <= 1)
(3 rows)postgres=# select * from rp where a <= 1;
a | b
---+----
1 | p2
(1 row)
I had noticed this and also that this crash:
tprt PARTITION BY RANGE(Col1)
tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
tprt_11 FOR VALUES FROM (1) TO (10000),
tprt_1d DEFAULT
tprt_2 FOR VALUES FROM (50001) TO (100001)
EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 20000 AND 70000;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
--
Beena Emerson
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks a lot Rajkumar and Beena for the tests.
On 2017/10/24 1:38, Beena Emerson wrote:
On Mon, Oct 23, 2017 at 3:24 PM, Rajkumar Raghuwanshi wrote:
Thanks for updated patches, I am able to compile it on head.
While testing this, I got an observation, pruning is not scanning default
partition leading to wrong output. below is test to reproduce this.create table rp (a int, b varchar) partition by range (a);
create table rp_p1 partition of rp default;
create table rp_p2 partition of rp for values from (1) to (10);
create table rp_p3 partition of rp for values from (10) to (maxvalue);insert into rp values (-1,'p1');
insert into rp values (1,'p2');
insert into rp values (11,'p3');postgres=# select tableoid::regclass,* from rp;
tableoid | a | b
----------+----+----
rp_p2 | 1 | p2
rp_p3 | 11 | p3
rp_p1 | -1 | p1
(3 rows)--with pruning
postgres=# explain (costs off) select * from rp where a <= 1;
QUERY PLAN
--------------------------
Append
-> Seq Scan on rp_p2
Filter: (a <= 1)
(3 rows)postgres=# select * from rp where a <= 1;
a | b
---+----
1 | p2
(1 row)
Both this (wrong output)...
I had noticed this and also that this crash:
tprt PARTITION BY RANGE(Col1)
tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
tprt_11 FOR VALUES FROM (1) TO (10000),
tprt_1d DEFAULT
tprt_2 FOR VALUES FROM (50001) TO (100001)EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 20000 AND 70000;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
...and this (crash) were due to bugs in the 0005 patch.
Output with the updated patch for Rajkumar's test:
explain (costs off ) select * from rp where a <= 1;
QUERY PLAN
--------------------------
Append
-> Seq Scan on rp_p2
Filter: (a <= 1)
-> Seq Scan on rp_p1
Filter: (a <= 1)
(5 rows)
select tableoid::regclass, * from rp where a <= 1;
tableoid | a | b
----------+----+----
rp_p2 | 1 | p2
rp_p1 | -1 | p1
(2 rows)
-- moreover
select tableoid::regclass, * from rp where a < 1;
tableoid | a | b
----------+----+----
rp_d | -1 | p1
(1 row)
Should be fixed in the attached updated version. While fixing the bugs, I
made some significant revisions to the code introduced by 0005.
No significant changes to any of the patches 0001-0004.
Thanks,
Amit
Attachments:
0001-Add-new-tests-for-partition-pruning-v3.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v3.patchDownload
From 823d3bff6e067172da3ba97b771e1bb32681f629 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/5] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 733 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 105 +++++
4 files changed, 840 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..61c4596bc7
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,733 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default
+ Filter: ((a)::numeric = '1'::numeric)
+(17 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 10)
+(7 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp4
+ Filter: (a > 10)
+ -> Seq Scan on rlp5
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default
+ Filter: (a > 10)
+(13 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default
+ Filter: (a < 15)
+(7 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 15)
+(13 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(7 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp5
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default
+ Filter: (a > 30)
+(5 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 31)
+(17 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(13 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a >= 15))
+(13 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(9 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+drop table lp, coll_pruning, rlp, mc3p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..d716fee69f
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,105 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+drop table lp, coll_pruning, rlp, mc3p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v3.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v3.patchDownload
From fa5e4f977d8db8ad54bb10fbb026d37bb329bd67 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/5] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning, although
as of this commit this contains *all* appinfos.
5. Code to handle the possibility that a partition RelOptInfo may
not have the basic information set (set_append_rel_size() does
that normally, but for partitioned tables, it will only do it
for the *live* partitions, but partitionwise-join code would
look at *all* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, do not call
get_partitions_from_clauses() right away. Instead, store the clauses
(somewhere, such as in the Append plan node) until such a time as
when all the "constant" values in them will be available. As of this
commit, we only pick up clauses from the baserestrictinfo list, so
it's safe to assume that each of the matched clause will provide the
constant value needed for pruning.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 24 ++
src/backend/optimizer/path/allpaths.c | 694 ++++++++++++++++++++++++++++------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 90 +++++
src/include/catalog/partition.h | 5 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 776 insertions(+), 118 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 07fdf66c38..f8da91d0fe 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,30 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4e565b3c00..862309263d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,15 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
+static BoolExpr *process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop);
/*
@@ -834,6 +845,17 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->rows = clamp_row_est(rel->rows);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
/*
* set_foreign_pathlist
* Build access paths for a foreign table RTE
@@ -846,6 +868,488 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &constfalse);
+
+ /*
+ * Since the clauses in rel->baserestrictinfo should all contain Const
+ * operands, it should be possible to prune partitions right away.
+ */
+ if (partclauses != NIL && !constfalse)
+ {
+ get_partitions_from_clauses(parent, rel->relid, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's recorded in the
+ * PartClauseValSet as well.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ bool constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &constfalse1) != NIL)
+ result = lappend(result, clause);
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and and that the latter is compatible with
+ * partitioning. If it is, we turn this into a OR BoolExpr:
+ * (key < val OR key > val), if the partitioning method
+ * supports such notion of inequlity.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+
+ if (partscheme->strategy == PARTITION_STRATEGY_RANGE ||
+ partscheme->strategy == PARTITION_STRATEGY_LIST)
+ {
+ BoolExpr *ne_or;
+
+ ne_or = process_partition_ne_op(rel, negator,
+ partopfamily,
+ partcoll,
+ (Expr *) leftop,
+ (Expr *) rightop);
+ result = lappend(result, ne_or);
+ }
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * Check if saop_op is compatible with partitioning. If so and
+ * if this saop is of type 'key op ANY (...)', convert this into
+ * a OR BoolExpr.
+ */
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle its negator is indeed a part of the partitioning
+ * operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ negated = true;
+ }
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ result = lappend(result, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ result = lappend(result, nulltest);
+ }
+ }
+ }
+
+ return result;
+}
+
+static BoolExpr *
+process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop)
+{
+ Expr *ltexpr,
+ *gtexpr;
+ Oid ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+
+ return (BoolExpr *) makeBoolExpr(OR_EXPR, list_make2(ltexpr, gtexpr), -1);
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1364,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1378,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1415,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1428,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,73 +1438,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
- */
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1152,6 +1608,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1247,14 +1714,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1325,43 +1807,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1378,17 +1857,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ecdd7280eb..d9bbf20acb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6160,14 +6160,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..8e290e19b0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,82 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..5f55550952 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,9 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0003-Implement-get_partitions_from_clauses-v3.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v3.patchDownload
From 72ffddb1c510a256d862dd183fbbeed6a2d68958 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/5] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1034 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1031 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f8da91d0fe..362ebba75b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid repeated recomputation in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartitionScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartitionScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartitionScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ int rt_index, List *clauses);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static Datum partkey_datum_from_expr(const Expr *expr);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartitionScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1439,15 +1554,928 @@ get_partitions_from_clauses(Relation relation, int rt_index,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionSet *partset;
+
+ partset = get_partitions_from_clauses_guts(relation, rt_index,
+ partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_using_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionSet *partset;
+ PartitionScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset = partset_new(true, false);
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ /*
+ * If this orarg refutes the table's partition constraint (if the
+ * the table is a partition at all), don't go looking for its
+ * partitions, that is, leave the partition set we're building
+ * for this OR clause untouched.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partconstr = (List *) canonicalize_qual((Expr *) partconstr);
+ Assert(rt_index > 0);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+
+ /*
+ * NB: if the clause may contain Param, replace them with
+ * equivalent Vars before proceeding, because predtest.c does
+ * not know about Params.
+ */
+ if (predicate_refuted_by(partconstr,
+ list_make1(orarg), false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_guts(relation, 0,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(a) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that informatin in the output argument
+ * *keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ memset(keynullness, 0, PARTITION_MAX_KEYS * sizeof(NullTestType *));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc = palloc0(sizeof(PartClause));
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ keynullness[i] = -1;
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Does leftop match with this partition key column? */
+ if ((IsA(leftop, Var) && partattno != 0 &&
+ ((Var *) leftop)->varattno == partattno) ||
+ equal(leftop, partexpr))
+ {
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * We would've accepted this saop only if its operator's
+ * negator was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ *or_clauses = lappend(*or_clauses, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is addmissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate keys. */
+ memset(keys, 0, sizeof(PartitionScanKeyInfo));
+ for (i = 0; i < n_eqkeys; i++)
+ keys->eqkeys[i] = partkey_datum_from_expr(eqkey_exprs[i]);
+ keys->n_eqkeys = n_eqkeys;
+
+ for (i = 0; i < n_minkeys; i++)
+ keys->minkeys[i] = partkey_datum_from_expr(minkey_exprs[i]);
+ keys->n_minkeys = n_minkeys;
+ keys->min_incl = min_incl;
+
+ for (i = 0; i < n_maxkeys; i++)
+ keys->maxkeys[i] = partkey_datum_from_expr(maxkey_exprs[i]);
+ keys->n_maxkeys = n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+
+ return n_eqkeys + n_minkeys + n_maxkeys + n_keynullness;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static Datum
+partkey_datum_from_expr(const Expr *expr)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ return ((Const *) expr)->constvalue;
+
+ default:
+ elog(ERROR, "invalid expression for partition key");
+ }
+
+ Assert(false); /* should never get here! */
+ return 0;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ leftarg_const = partkey_datum_from_expr(leftarg->constarg);
+ rightarg_const = partkey_datum_from_expr(rightarg->constarg);
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartitionScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v3.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v3.patchDownload
From e7aa6826c055c3c02bd902bc9bee4ed3c31d644f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 362ebba75b..73f4e7ab95 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3475,12 +3510,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3492,6 +3530,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3522,12 +3561,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3724,12 +3764,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3751,11 +3791,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3763,17 +3803,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3784,12 +3842,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3803,20 +3862,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3829,8 +3887,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Implement-get_partitions_for_keys-v3.patchtext/plain; charset=UTF-8; name=0005-Implement-get_partitions_for_keys-v3.patchDownload
From 5760054b91f7f115b3690432d1ee72a3db7770f3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 5/5] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 356 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 64 ++----
3 files changed, 371 insertions(+), 53 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 73f4e7ab95..c5875dc064 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -178,7 +178,7 @@ typedef struct PartitionScanKeyInfo
typedef struct PartitionSet
{
/*
- * If either empty or all_parts is true, values of the other fields are
+ * If either empty or all is true, values of the other fields are
* invalid.
*/
bool empty; /* contains no partitions */
@@ -1121,7 +1121,7 @@ check_default_allows_bound(Relation parent, Relation default_rel,
{
List *new_part_constraints;
List *def_part_constraints;
- List *all_parts;
+ List *all;
ListCell *lc;
new_part_constraints = (new_spec->strategy == PARTITION_STRATEGY_LIST)
@@ -1148,12 +1148,12 @@ check_default_allows_bound(Relation parent, Relation default_rel,
* that do not satisfy the revised partition constraints.
*/
if (default_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- all_parts = find_all_inheritors(RelationGetRelid(default_rel),
+ all = find_all_inheritors(RelationGetRelid(default_rel),
AccessExclusiveLock, NULL);
else
- all_parts = list_make1_oid(RelationGetRelid(default_rel));
+ all = list_make1_oid(RelationGetRelid(default_rel));
- foreach(lc, all_parts)
+ foreach(lc, all)
{
Oid part_relid = lfirst_oid(lc);
Relation part_rel;
@@ -2507,7 +2507,351 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff = -1,
+ minoff = -1,
+ maxoff = -1;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal;
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * Initialize the set as one that's neither empty nor contains all
+ * partitions. The code below will set min_part_idx and max_part_idx
+ * and/or other_parts as found out by comparing keys to the partition
+ * bounds, as well as considering special partitions like null-accepting
+ * and default partitions. If it turns out that no partitions need to
+ * be scanned, partset->empty will be set to true.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the null-accepting partition and the
+ * default partition can be holding null values at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its outout. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+ return partset;
+ }
+ /* No bounding keys, so just return all partitions. */
+ else if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ {
+ partset->all_parts = true;
+ return partset;
+ }
+
+ /* Valid keys->eqkeys must provoide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
+ eqoff = -1;
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, must exactly match the datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts =
+ bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return partset;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * minkeys matched one of the datums (because, is_equal), but
+ * the query may have asked to exclude that value. If so,
+ * move to the bound on the right, which doesn't necessarily
+ * mean we're excluding the list partition containing that
+ * value, because there very well might be values in the range
+ * thus selected that belong to the partition to which the
+ * matched value (minkeys) also belongs.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1,
+ * then, would be the upper bound of the leftmost partition
+ * that needs to be scanned.
+ */
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* See the comment above for minkeys. */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * itself is the upper bound of the rightmost partition that
+ * needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool include_default = false;
+
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper bound of a
+ * range of values unassigned to any partition, move to the adjacent
+ * bound which instead must be the upper bound of the leftmost or
+ * rightmost partition, respectively, that needs to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do indeed
+ * satisfy the query, but don't have a valid partition assigned.
+ * Include the default partition in that case. Although, if the
+ * original bound in question is an infinite value, there would not
+ * be any unassigned range, because the range is unbounded in that
+ * direction by definition.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int last_key;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_RANGE);
+ last_key = keys->n_minkeys > 0 ? keys->n_minkeys - 1
+ : partkey->partnatts - 1;
+ if (boundinfo->kind[minoff][last_key] == PARTITION_RANGE_DATUM_VALUE)
+ include_default = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int last_key;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_RANGE);
+ last_key = keys->n_maxkeys > 0 ? keys->n_maxkeys - 1
+ : partkey->partnatts - 1;
+ maxoff -= 1;
+ if (boundinfo->kind[maxoff][last_key] == PARTITION_RANGE_DATUM_VALUE)
+ include_default = true;
+ }
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Add to the other_parts, list partition indexes are not
+ * monotonously increasing like range partitions' are.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+ /*
+ * If minoff != maxoff, there might be datums in that range
+ * range that don't have a non-default partition assigned.
+ */
+ include_default = (minoff != maxoff);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_default = true;
+ break;
+ }
+ }
+ break;
+ }
+
+ if (include_default && partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+ else
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 61c4596bc7..69a7819171 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -198,16 +198,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
--------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -453,15 +451,13 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3nullxy
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(9 rows)
+(7 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -475,16 +471,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -502,9 +496,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -514,9 +506,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -530,9 +520,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -571,16 +559,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -592,9 +578,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -621,8 +605,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -630,9 +614,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -672,9 +654,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -712,9 +692,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -726,8 +704,6 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
drop table lp, coll_pruning, rlp, mc3p;
--
2.11.0
On 2017/10/25 15:47, Amit Langote wrote:
On 2017/10/24 1:38, Beena Emerson wrote:
I had noticed this and also that this crash:
tprt PARTITION BY RANGE(Col1)
tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
tprt_11 FOR VALUES FROM (1) TO (10000),
tprt_1d DEFAULT
tprt_2 FOR VALUES FROM (50001) TO (100001)EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 20000 AND 70000;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>...and this (crash) were due to bugs in the 0005 patch.
[ .... ]
Should be fixed in the attached updated version.
Oops, not quite. The crash that Beena reported wasn't fixed (or rather
reintroduced by some unrelated change after once confirming it was fixed).
Really fixed this time.
Thanks,
Amit
Attachments:
0001-Add-new-tests-for-partition-pruning-v4.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v4.patchDownload
From b7b2db5d70865b7a1e4124e639aa36dfec59d984 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/5] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 770 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 108 +++++
4 files changed, 880 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..c0365f0b52
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,770 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default
+ Filter: ((a)::numeric = '1'::numeric)
+(19 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 10)
+(7 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4
+ Filter: (a > 10)
+ -> Seq Scan on rlp5
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default
+ Filter: (a > 10)
+(15 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default
+ Filter: (a < 15)
+(7 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 15)
+(15 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp5
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default
+ Filter: (a > 30)
+(5 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 31)
+(19 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(13 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a >= 15))
+(15 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+drop table lp, coll_pruning, rlp, mc3p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..093f676f26
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,108 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+drop table lp, coll_pruning, rlp, mc3p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v4.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v4.patchDownload
From 28dc86c17c8597a3015f86d74afad3d577d1d61b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/5] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning, although
as of this commit this contains *all* appinfos.
5. Code to handle the possibility that a partition RelOptInfo may
not have the basic information set (set_append_rel_size() does
that normally, but for partitioned tables, it will only do it
for the *live* partitions, but partitionwise-join code would
look at *all* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, do not call
get_partitions_from_clauses() right away. Instead, store the clauses
(somewhere, such as in the Append plan node) until such a time as
when all the "constant" values in them will be available. As of this
commit, we only pick up clauses from the baserestrictinfo list, so
it's safe to assume that each of the matched clause will provide the
constant value needed for pruning.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 24 ++
src/backend/optimizer/path/allpaths.c | 694 ++++++++++++++++++++++++++++------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 90 +++++
src/include/catalog/partition.h | 5 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 776 insertions(+), 118 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 07fdf66c38..f8da91d0fe 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,30 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4e565b3c00..862309263d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,15 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
+static BoolExpr *process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop);
/*
@@ -834,6 +845,17 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->rows = clamp_row_est(rel->rows);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
/*
* set_foreign_pathlist
* Build access paths for a foreign table RTE
@@ -846,6 +868,488 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &constfalse);
+
+ /*
+ * Since the clauses in rel->baserestrictinfo should all contain Const
+ * operands, it should be possible to prune partitions right away.
+ */
+ if (partclauses != NIL && !constfalse)
+ {
+ get_partitions_from_clauses(parent, rel->relid, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's recorded in the
+ * PartClauseValSet as well.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ bool constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &constfalse1) != NIL)
+ result = lappend(result, clause);
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and and that the latter is compatible with
+ * partitioning. If it is, we turn this into a OR BoolExpr:
+ * (key < val OR key > val), if the partitioning method
+ * supports such notion of inequlity.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+
+ if (partscheme->strategy == PARTITION_STRATEGY_RANGE ||
+ partscheme->strategy == PARTITION_STRATEGY_LIST)
+ {
+ BoolExpr *ne_or;
+
+ ne_or = process_partition_ne_op(rel, negator,
+ partopfamily,
+ partcoll,
+ (Expr *) leftop,
+ (Expr *) rightop);
+ result = lappend(result, ne_or);
+ }
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * Check if saop_op is compatible with partitioning. If so and
+ * if this saop is of type 'key op ANY (...)', convert this into
+ * a OR BoolExpr.
+ */
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle its negator is indeed a part of the partitioning
+ * operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ negated = true;
+ }
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ result = lappend(result, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ result = lappend(result, nulltest);
+ }
+ }
+ }
+
+ return result;
+}
+
+static BoolExpr *
+process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop)
+{
+ Expr *ltexpr,
+ *gtexpr;
+ Oid ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+
+ return (BoolExpr *) makeBoolExpr(OR_EXPR, list_make2(ltexpr, gtexpr), -1);
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1364,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1378,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1415,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1428,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,73 +1438,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
- */
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1152,6 +1608,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1247,14 +1714,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1325,43 +1807,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1378,17 +1857,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ecdd7280eb..d9bbf20acb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6160,14 +6160,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..8e290e19b0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,82 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..5f55550952 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,9 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0003-Implement-get_partitions_from_clauses-v4.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v4.patchDownload
From 575f8a4e84f2124bd32548d3d8d02dac85ae61a6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/5] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1034 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1031 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f8da91d0fe..362ebba75b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid repeated recomputation in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartitionScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartitionScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartitionScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ int rt_index, List *clauses);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static Datum partkey_datum_from_expr(const Expr *expr);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartitionScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1439,15 +1554,928 @@ get_partitions_from_clauses(Relation relation, int rt_index,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionSet *partset;
+
+ partset = get_partitions_from_clauses_guts(relation, rt_index,
+ partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_using_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionSet *partset;
+ PartitionScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset = partset_new(true, false);
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ /*
+ * If this orarg refutes the table's partition constraint (if the
+ * the table is a partition at all), don't go looking for its
+ * partitions, that is, leave the partition set we're building
+ * for this OR clause untouched.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partconstr = (List *) canonicalize_qual((Expr *) partconstr);
+ Assert(rt_index > 0);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+
+ /*
+ * NB: if the clause may contain Param, replace them with
+ * equivalent Vars before proceeding, because predtest.c does
+ * not know about Params.
+ */
+ if (predicate_refuted_by(partconstr,
+ list_make1(orarg), false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_guts(relation, 0,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(a) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that informatin in the output argument
+ * *keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ memset(keynullness, 0, PARTITION_MAX_KEYS * sizeof(NullTestType *));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc = palloc0(sizeof(PartClause));
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ keynullness[i] = -1;
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Does leftop match with this partition key column? */
+ if ((IsA(leftop, Var) && partattno != 0 &&
+ ((Var *) leftop)->varattno == partattno) ||
+ equal(leftop, partexpr))
+ {
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * We would've accepted this saop only if its operator's
+ * negator was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ *or_clauses = lappend(*or_clauses, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is addmissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate keys. */
+ memset(keys, 0, sizeof(PartitionScanKeyInfo));
+ for (i = 0; i < n_eqkeys; i++)
+ keys->eqkeys[i] = partkey_datum_from_expr(eqkey_exprs[i]);
+ keys->n_eqkeys = n_eqkeys;
+
+ for (i = 0; i < n_minkeys; i++)
+ keys->minkeys[i] = partkey_datum_from_expr(minkey_exprs[i]);
+ keys->n_minkeys = n_minkeys;
+ keys->min_incl = min_incl;
+
+ for (i = 0; i < n_maxkeys; i++)
+ keys->maxkeys[i] = partkey_datum_from_expr(maxkey_exprs[i]);
+ keys->n_maxkeys = n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+
+ return n_eqkeys + n_minkeys + n_maxkeys + n_keynullness;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static Datum
+partkey_datum_from_expr(const Expr *expr)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ return ((Const *) expr)->constvalue;
+
+ default:
+ elog(ERROR, "invalid expression for partition key");
+ }
+
+ Assert(false); /* should never get here! */
+ return 0;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ leftarg_const = partkey_datum_from_expr(leftarg->constarg);
+ rightarg_const = partkey_datum_from_expr(rightarg->constarg);
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartitionScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v4.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v4.patchDownload
From 22a1821c2223db8a6e6c3fa3afb3b7ce66a103af Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 362ebba75b..73f4e7ab95 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3475,12 +3510,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3492,6 +3530,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3522,12 +3561,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3724,12 +3764,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3751,11 +3791,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3763,17 +3803,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3784,12 +3842,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3803,20 +3862,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3829,8 +3887,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Implement-get_partitions_for_keys-v4.patchtext/plain; charset=UTF-8; name=0005-Implement-get_partitions_for_keys-v4.patchDownload
From b0f307b1c79f9313bb80c8185a16d5e2dba9424e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 5/5] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 347 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 64 ++----
3 files changed, 367 insertions(+), 48 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 73f4e7ab95..1e27f72d6e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2507,7 +2507,352 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff = -1,
+ minoff = -1,
+ maxoff = -1;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal;
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * Initialize the set as one that's neither empty nor contains all
+ * partitions. The code below will set min_part_idx and max_part_idx
+ * and/or other_parts as found out by comparing keys to the partition
+ * bounds, as well as considering special partitions like null-accepting
+ * and default partitions. If it turns out that no partitions need to
+ * be scanned, partset->empty will be set to true.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the null-accepting partition and the
+ * default partition can be holding null values at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its outout. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+ return partset;
+ }
+ /* No bounding keys, so just return all partitions. */
+ else if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ {
+ partset->all_parts = true;
+ return partset;
+ }
+
+ /* Valid keys->eqkeys must provoide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
+ eqoff = -1;
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, must exactly match the datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts =
+ bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return partset;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * minkeys matched one of the datums (because, is_equal), but
+ * the query may have asked to exclude that value. If so,
+ * move to the bound on the right, which doesn't necessarily
+ * mean we're excluding the list partition containing that
+ * value, because there very well might be values in the range
+ * thus selected that belong to the partition to which the
+ * matched value (minkeys) also belongs.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* See the comment above for minkeys. */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * itself is the upper bound of the rightmost partition that
+ * needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool include_default = false;
+
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper bound of a
+ * range of values unassigned to any partition, move to the adjacent
+ * bound which instead must be the upper bound of the leftmost or
+ * rightmost partition, respectively, that needs to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do indeed
+ * satisfy the query, but don't have a valid partition assigned.
+ * Include the default partition in that case. Although, if the
+ * original bound in question is an infinite value, there would not
+ * be any unassigned range, because the range is unbounded in that
+ * direction by definition.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int last_key;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_RANGE);
+ last_key = keys->n_minkeys > 0 ? keys->n_minkeys - 1
+ : partkey->partnatts - 1;
+ if (boundinfo->kind[minoff][last_key] == PARTITION_RANGE_DATUM_VALUE)
+ include_default = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int last_key;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_RANGE);
+ last_key = keys->n_maxkeys > 0 ? keys->n_maxkeys - 1
+ : partkey->partnatts - 1;
+ maxoff -= 1;
+ if (boundinfo->kind[maxoff][last_key] == PARTITION_RANGE_DATUM_VALUE)
+ include_default = true;
+ }
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Add to the other_parts, list partition indexes are not
+ * monotonously increasing like range partitions' are.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+ /*
+ * If minoff != maxoff, there might be datums in that range
+ * range that don't have a non-default partition assigned.
+ */
+ include_default = (minoff != maxoff);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_default = true;
+ break;
+ }
+ }
+ break;
+ }
+
+ if (include_default && partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+ else
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index c0365f0b52..5c64e60ebe 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -199,16 +199,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
--------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -488,8 +486,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -498,7 +494,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -512,16 +508,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -539,9 +533,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -551,9 +543,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -567,9 +557,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -608,16 +596,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -629,9 +615,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -658,8 +642,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -667,9 +651,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -709,9 +691,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -749,9 +729,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -763,8 +741,6 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
drop table lp, coll_pruning, rlp, mc3p;
--
2.11.0
Hello Amit,
Thanks for the updated patches
On Wed, Oct 25, 2017 at 1:07 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/10/25 15:47, Amit Langote wrote:
On 2017/10/24 1:38, Beena Emerson wrote:
I had noticed this and also that this crash:
tprt PARTITION BY RANGE(Col1)
tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
tprt_11 FOR VALUES FROM (1) TO (10000),
tprt_1d DEFAULT
tprt_2 FOR VALUES FROM (50001) TO (100001)EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 20000 AND 70000;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>...and this (crash) were due to bugs in the 0005 patch.
[ .... ]
Should be fixed in the attached updated version.
Oops, not quite. The crash that Beena reported wasn't fixed (or rather
reintroduced by some unrelated change after once confirming it was fixed).Really fixed this time.
The crashes are fixed. However, handling of DEFAULT partition in
various queries is not proper.
Case 1: In this case default should be selected.
DROP TABLE tprt;
CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
PARTITION BY list(col1);
CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES IN (20000, 25000);
CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES IN (50000, 35000);
CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES IN (10000);
CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;
postgres=# EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 10000;
QUERY PLAN
--------------------------
Result
One-Time Filter: false
(2 rows)
Case 2: In this case DEFAULT need not be selected.
DROP TABLE tprt;
CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
PARTITION BY range(col1);
CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES FROM (1) TO (10000);
CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES FROM (10000) TO (20000);
CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES FROM (20000) TO (30000);
CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;
INSERT INTO tprt SELECT generate_series(1,50000), generate_series(1,50000);
postgres=# EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 10000;
QUERY PLAN
--------------------------------
Append
-> Seq Scan on tprt_11
Filter: (col1 < 10000)
-> Seq Scan on tprt_1d
Filter: (col1 < 10000)
(5 rows)
--
Beena Emerson
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
On Wed, Oct 25, 2017 at 1:07 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/10/25 15:47, Amit Langote wrote:
On 2017/10/24 1:38, Beena Emerson wrote:
I had noticed this and also that this crash:
tprt PARTITION BY RANGE(Col1)
tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
tprt_11 FOR VALUES FROM (1) TO (10000),
tprt_1d DEFAULT
tprt_2 FOR VALUES FROM (50001) TO (100001)EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 20000 AND 70000;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>...and this (crash) were due to bugs in the 0005 patch.
[ .... ]
Should be fixed in the attached updated version.
Oops, not quite. The crash that Beena reported wasn't fixed (or rather
reintroduced by some unrelated change after once confirming it was fixed).Really fixed this time.
Some minor comments:
1. wrong function name (0003)
The comment on function get_partitions_from_clauses_guts uses wrong name:
instead of "_from_", "_using_" is written.
/*
+ * get_partitions_using_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
2. typo information (0003)
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that informatin in the output argument
3. misspell admissible (0003)
+ * columns, whereas a prefix of all partition key columns is addmissible
+ * as min and max keys.
4. double and? (0002)
+ * as part of the operator family, check if its negator
+ * exists and and that the latter is compatible with
5. typo inequality (0002)
+ * (key < val OR key > val), if the partitioning method
+ * supports such notion of inequlity.
6. typo output (0005)
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its outout.
7. typo provide (0005)
+ /* Valid keys->eqkeys must provoide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
8. comment of struct PartClause (0003)
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid repeated recomputation in remove_redundant_clauses().
+ */
Instead of repeated recomputation, we can use just " repeated
computation" or just " recomputation"
--
Beena Emerson
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Beena,
Thanks for the tests.
On 2017/10/25 18:18, Beena Emerson wrote:
The crashes are fixed. However, handling of DEFAULT partition in
various queries is not proper.Case 1: In this case default should be selected.
DROP TABLE tprt;
CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
PARTITION BY list(col1);
CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES IN (20000, 25000);
CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES IN (50000, 35000);
CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES IN (10000);
CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;postgres=# EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 10000;
QUERY PLAN
--------------------------
Result
One-Time Filter: false
(2 rows)
Hmm, this clearly looks wrong. Fixed in the attached.
Case 2: In this case DEFAULT need not be selected.
DROP TABLE tprt;
CREATE TABLE tprt (col1 int, col2 int) PARTITION BY range(col1);
CREATE TABLE tprt_1 PARTITION OF tprt FOR VALUES FROM (1) TO (50001)
PARTITION BY range(col1);
CREATE TABLE tprt_11 PARTITION OF tprt_1 FOR VALUES FROM (1) TO (10000);
CREATE TABLE tprt_12 PARTITION OF tprt_1 FOR VALUES FROM (10000) TO (20000);
CREATE TABLE tprt_13 PARTITION OF tprt_1 FOR VALUES FROM (20000) TO (30000);
CREATE TABLE tprt_1d PARTITION OF tprt_1 DEFAULT;
INSERT INTO tprt SELECT generate_series(1,50000), generate_series(1,50000);postgres=# EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 < 10000;
QUERY PLAN
--------------------------------
Append
-> Seq Scan on tprt_11
Filter: (col1 < 10000)
-> Seq Scan on tprt_1d
Filter: (col1 < 10000)
(5 rows)
Yeah, ideally. But it's kind of hard to for the new partition-pruning
algorithm to be *that* correct in this particular case involving default
partitions. Let me try to explain why I think it may be a bit hard to
implement.
I perhaps have mentioned before that the new partition-pruning algorithm
runs for every partitioned table in the tree separately. In this example,
it will first determine for the root table tprt that only the partition
tprt_1 needs to be scanned. Since tprt_1 is itself partitioned, algorithm
will run again, but the fact that tprt_1 (iow, any of its partitions) is
itself constrained to range [1, 50001) is, for the most part, lost on the
algorithm. Note that non-default partitions (tprt_11, tprt_12, ...) have
bound datums in PartitionBoundInfo describing the range of data they
contain, which the algorithm uses to determine the set of partitions
satisfying given set of clauses. The default partition has no datums.
The only thing describing what it contains is its partition constraint.
From the clause col1 < 10000, the algorithm will conclude that the default
partition might contain some data satisfying the same, because it knows
for sure that there no non-default partition for keys < 1.
It can perhaps taught to not make that conclusion by taking into account
the default partition's partition constraint, which includes constraint
inherited from the parent, viz. 1 <= col1 < 50001. To do that, it might
be possible to summon up predtest.c's powers to conclude from the default
partition's partition constraint that it cannot contain any keys < 1, but
then we'll have to frame up a clause expression describing the latter.
Generating such a clause expression can be a bit daunting for a
multi-column key. So, I haven't yet tried really hard to implement this.
Any thoughts on that?
Meanwhile, attached updated set of patches including fixes for the typos
you reported in the other message. Updated 0005 fixes the first bug (the
Case 1 in your email), while other patches 0002-0004 are updated mostly to
fix the reported typos. A couple of tests are added in 0001 to test the
default partition case a bit more.
Thanks,
Amit
Attachments:
0001-Add-new-tests-for-partition-pruning-v5.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v5.patchDownload
From 126db4e77ac7d6d684e1ec97430cbbdc5d142432 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/5] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 841 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 125 +++++
4 files changed, 968 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..cda067da3a
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,841 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default
+ Filter: ((a)::numeric = '1'::numeric)
+(19 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 10)
+(7 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4
+ Filter: (a > 10)
+ -> Seq Scan on rlp5
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default
+ Filter: (a > 10)
+(15 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default
+ Filter: (a < 15)
+(7 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 15)
+(15 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text < 'ab'::text) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+(5 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp5
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default
+ Filter: (a > 30)
+(5 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 31)
+(19 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(13 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a >= 15))
+(15 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+explain select * from mc2p where a < 2;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p0 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p1 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p2 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+(9 rows)
+
+explain select * from mc2p where a = 2 and b < 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b < 1) AND (a = 2))
+(3 rows)
+
+explain select * from mc2p where a > 1;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p4 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p5 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+(9 rows)
+
+explain select * from mc2p where a = 1 and b > 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p2 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b > 1) AND (a = 1))
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..b71849959d
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,125 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30);
+create table rlp5 partition of rlp for values from (31) to (maxvalue);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+
+explain select * from mc2p where a < 2;
+explain select * from mc2p where a = 2 and b < 1;
+explain select * from mc2p where a > 1;
+explain select * from mc2p where a = 1 and b > 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v5.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v5.patchDownload
From f478c8a815c9e354a586b48e2bcb86e2dc38672f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/5] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning, although
as of this commit this contains *all* appinfos.
5. Code to handle the possibility that a partition RelOptInfo may
not have the basic information set (set_append_rel_size() does
that normally, but for partitioned tables, it will only do it
for the *live* partitions, but partitionwise-join code would
look at *all* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, do not call
get_partitions_from_clauses() right away. Instead, store the clauses
(somewhere, such as in the Append plan node) until such a time as
when all the "constant" values in them will be available. As of this
commit, we only pick up clauses from the baserestrictinfo list, so
it's safe to assume that each of the matched clause will provide the
constant value needed for pruning.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 24 ++
src/backend/optimizer/path/allpaths.c | 694 ++++++++++++++++++++++++++++------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 90 +++++
src/include/catalog/partition.h | 5 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 776 insertions(+), 118 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 07fdf66c38..f8da91d0fe 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,30 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4e565b3c00..e4427b8a58 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,15 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
+static BoolExpr *process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop);
/*
@@ -834,6 +845,17 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->rows = clamp_row_est(rel->rows);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
/*
* set_foreign_pathlist
* Build access paths for a foreign table RTE
@@ -846,6 +868,488 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &constfalse);
+
+ /*
+ * Since the clauses in rel->baserestrictinfo should all contain Const
+ * operands, it should be possible to prune partitions right away.
+ */
+ if (partclauses != NIL && !constfalse)
+ {
+ get_partitions_from_clauses(parent, rel->relid, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's recorded in the
+ * PartClauseValSet as well.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ bool constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &constfalse1) != NIL)
+ result = lappend(result, clause);
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and that the latter is compatible with partitioning.
+ * If it is, we turn this into a OR BoolExpr: (key < val OR
+ * key > val), if the partitioning method supports such
+ * notion of inequality.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+
+ if (partscheme->strategy == PARTITION_STRATEGY_RANGE ||
+ partscheme->strategy == PARTITION_STRATEGY_LIST)
+ {
+ BoolExpr *ne_or;
+
+ ne_or = process_partition_ne_op(rel, negator,
+ partopfamily,
+ partcoll,
+ (Expr *) leftop,
+ (Expr *) rightop);
+ result = lappend(result, ne_or);
+ }
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * Check if saop_op is compatible with partitioning. If so and
+ * if this saop is of type 'key op ANY (...)', convert this into
+ * a OR BoolExpr.
+ */
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle its negator is indeed a part of the partitioning
+ * operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ negated = true;
+ }
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ result = lappend(result, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ result = lappend(result, nulltest);
+ }
+ }
+ }
+
+ return result;
+}
+
+static BoolExpr *
+process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop)
+{
+ Expr *ltexpr,
+ *gtexpr;
+ Oid ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+
+ return (BoolExpr *) makeBoolExpr(OR_EXPR, list_make2(ltexpr, gtexpr), -1);
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1364,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1378,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1415,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1428,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,73 +1438,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
- */
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1152,6 +1608,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1247,14 +1714,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1325,43 +1807,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1378,17 +1857,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d58635c887..24d800d8b7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6182,14 +6182,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..8e290e19b0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,82 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..5f55550952 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,9 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0003-Implement-get_partitions_from_clauses-v5.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v5.patchDownload
From 84b77f2380a1669b374d843b4954cb6e3f0880e5 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/5] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1036 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1032 insertions(+), 4 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f8da91d0fe..8d508e549b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartitionScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartitionScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartitionScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ int rt_index, List *clauses);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static Datum partkey_datum_from_expr(const Expr *expr);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartitionScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1422,7 +1537,7 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
/*
- * get_partitions_using_clauses
+ * get_partitions_from_clauses
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
*
@@ -1439,15 +1554,928 @@ get_partitions_from_clauses(Relation relation, int rt_index,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionSet *partset;
+
+ partset = get_partitions_from_clauses_guts(relation, rt_index,
+ partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_from_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionSet *partset;
+ PartitionScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset = partset_new(true, false);
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ /*
+ * If this orarg refutes the table's partition constraint (if the
+ * the table is a partition at all), don't go looking for its
+ * partitions, that is, leave the partition set we're building
+ * for this OR clause untouched.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partconstr = (List *) canonicalize_qual((Expr *) partconstr);
+ Assert(rt_index > 0);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+
+ /*
+ * NB: if the clause may contain Param, replace them with
+ * equivalent Vars before proceeding, because predtest.c does
+ * not know about Params.
+ */
+ if (predicate_refuted_by(partconstr,
+ list_make1(orarg), false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_guts(relation, 0,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(a) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartitionScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ memset(keynullness, 0, PARTITION_MAX_KEYS * sizeof(NullTestType *));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc = palloc0(sizeof(PartClause));
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ keynullness[i] = -1;
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Does leftop match with this partition key column? */
+ if ((IsA(leftop, Var) && partattno != 0 &&
+ ((Var *) leftop)->varattno == partattno) ||
+ equal(leftop, partexpr))
+ {
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * We would've accepted this saop only if its operator's
+ * negator was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ *or_clauses = lappend(*or_clauses, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is admissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate keys. */
+ memset(keys, 0, sizeof(PartitionScanKeyInfo));
+ for (i = 0; i < n_eqkeys; i++)
+ keys->eqkeys[i] = partkey_datum_from_expr(eqkey_exprs[i]);
+ keys->n_eqkeys = n_eqkeys;
+
+ for (i = 0; i < n_minkeys; i++)
+ keys->minkeys[i] = partkey_datum_from_expr(minkey_exprs[i]);
+ keys->n_minkeys = n_minkeys;
+ keys->min_incl = min_incl;
+
+ for (i = 0; i < n_maxkeys; i++)
+ keys->maxkeys[i] = partkey_datum_from_expr(maxkey_exprs[i]);
+ keys->n_maxkeys = n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+
+ return n_eqkeys + n_minkeys + n_maxkeys + n_keynullness;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static Datum
+partkey_datum_from_expr(const Expr *expr)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ return ((Const *) expr)->constvalue;
+
+ default:
+ elog(ERROR, "invalid expression for partition key");
+ }
+
+ Assert(false); /* should never get here! */
+ return 0;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ leftarg_const = partkey_datum_from_expr(leftarg->constarg);
+ rightarg_const = partkey_datum_from_expr(rightarg->constarg);
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartitionScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v5.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v5.patchDownload
From 878357a06082c5ca156edb27212693d8d959d8f1 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8d508e549b..1d6d1d042c 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3475,12 +3510,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3492,6 +3530,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3522,12 +3561,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3724,12 +3764,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3751,11 +3791,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3763,17 +3803,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3784,12 +3842,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3803,20 +3862,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3829,8 +3887,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Implement-get_partitions_for_keys-v5.patchtext/plain; charset=UTF-8; name=0005-Implement-get_partitions_for_keys-v5.patchDownload
From 9ee1df62011c543cc558531593c300999237cb37 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 5/5] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 372 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 64 ++----
3 files changed, 392 insertions(+), 48 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1d6d1d042c..a4ce8db5e7 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2507,7 +2507,377 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartitionScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal;
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * Initialize the set as one that's neither empty nor contains all
+ * partitions. The code below will set min_part_idx and max_part_idx
+ * and/or other_parts as found out by comparing keys to the partition
+ * bounds, as well as considering special partitions like null-accepting
+ * and default partitions. If it turns out that no partitions need to
+ * be scanned, partset->empty will be set to true.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the null-accepting partition and the
+ * default partition can be holding null values at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+ return partset;
+ }
+ /* No bounding keys, so just return all partitions. */
+ else if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ {
+ partset->all_parts = true;
+ return partset;
+ }
+
+ /* Valid keys->eqkeys must provide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, must exactly match the datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts =
+ bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return partset;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * minkeys matched one of the datums (because, is_equal), but
+ * the query may have asked to exclude that value. If so,
+ * move to the bound on the right, which doesn't necessarily
+ * mean we're excluding the list partition containing that
+ * value, because there very well might be values in the range
+ * thus selected that belong to the partition to which the
+ * matched value (minkeys) also belongs.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* See the comment above for minkeys. */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff/maxoff set to -1 means none of the datums in PartitionBoundInfo
+ * satisfies minkeys/maxkeys. If both are set to a valid datum offset,
+ * that means there exists at least some datums (and hence partitions)
+ * satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Add to the other_parts, list partition indexes are not
+ * monotonously increasing like range partitions' are.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+
+ /*
+ * If the query doesn't specify either the lower or the upper
+ * bound, consider including the default partition in the
+ * result set, because the existing partitions may not cover
+ * all of the values that such an unbounded range contains.
+ *
+ * Also, if minoff != maxoff, there might be datums in that
+ * range that don't have a non-default partition assigned.
+ */
+ if (keys->n_minkeys == 0 || keys->n_maxkeys == 0 ||
+ minoff != maxoff)
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (boundinfo->kind[maxoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index cda067da3a..4c13a7ffa4 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -199,16 +199,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
--------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -506,8 +504,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -516,7 +512,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -530,16 +526,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -557,9 +551,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -569,9 +561,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -585,9 +575,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -626,16 +614,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -647,9 +633,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -676,8 +660,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -685,9 +669,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -727,9 +709,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -767,9 +747,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -781,9 +759,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
-- a simpler multi-column keys case
create table mc2p (a int, b int) partition by range (a, b);
--
2.11.0
On Thu, Oct 26, 2017 at 1:17 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
It can perhaps taught to not make that conclusion by taking into account
the default partition's partition constraint, which includes constraint
inherited from the parent, viz. 1 <= col1 < 50001. To do that, it might
be possible to summon up predtest.c's powers to conclude from the default
partition's partition constraint that it cannot contain any keys < 1, but
then we'll have to frame up a clause expression describing the latter.
Generating such a clause expression can be a bit daunting for a
multi-column key. So, I haven't yet tried really hard to implement this.
Any thoughts on that?
I don't think we really want to get into theorem-proving here, because
it's slow. Whatever we're going to do we should be able to do without
that - keeping it in the form of btree-strategy + value. It doesn't
seem that hard. Suppose we're asked to select partitions from tprt
subject to (<, 10000). Well, we determine that some of the tprt_1
partitions may be relevant, so we tell tprt_1 to select partitions
subject to (>=, 1, <, 10000). We know to do that because we know that
10000 < 50000 and we know to include >= 1 because we haven't got any
lower bound currently at all. What's the problem?
In some sense it's tempting to say that this case just doesn't matter
very much; after all, subpartitioning on the same column used to
partition at the top level is arguably lame. But if we can get it
right in a relatively straightforward manner then let's do it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/10/26 20:34, Robert Haas wrote:
On Thu, Oct 26, 2017 at 1:17 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:It can perhaps taught to not make that conclusion by taking into account
the default partition's partition constraint, which includes constraint
inherited from the parent, viz. 1 <= col1 < 50001. To do that, it might
be possible to summon up predtest.c's powers to conclude from the default
partition's partition constraint that it cannot contain any keys < 1, but
then we'll have to frame up a clause expression describing the latter.
Generating such a clause expression can be a bit daunting for a
multi-column key. So, I haven't yet tried really hard to implement this.
Any thoughts on that?I don't think we really want to get into theorem-proving here, because
it's slow.
Just to be clear, I'm saying we could use theorem-proving (if at all) just
for the default partition.
Whatever we're going to do we should be able to do without
that - keeping it in the form of btree-strategy + value. It doesn't
seem that hard. Suppose we're asked to select partitions from tprt
subject to (<, 10000). Well, we determine that some of the tprt_1
partitions may be relevant, so we tell tprt_1 to select partitions
subject to (>=, 1, <, 10000). We know to do that because we know that
10000 < 50000 and we know to include >= 1 because we haven't got any
lower bound currently at all. What's the problem?
Hmm, that's interesting. With the approach that the patch currently
takes, (>= 1) wouldn't be passed down when selecting the partitions of
tprt_1. The source of values (+ btree strategy) to use to select
partitions is the same original set of clauses for all partitioned tables
in the tree, as the patch currently implements it. Nothing would get
added to that set (>= 1, as in this example), nor subtracted (such as
clauses that are trivially true).
I will think about this approach in general and to solve this problem in
particular.
In some sense it's tempting to say that this case just doesn't matter
very much; after all, subpartitioning on the same column used to
partition at the top level is arguably lame. But if we can get it
right in a relatively straightforward manner then let's do it.
Yeah, I tend to agree.
Thanks for the input.
Regards,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 27, 2017 at 3:17 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I don't think we really want to get into theorem-proving here, because
it's slow.Just to be clear, I'm saying we could use theorem-proving (if at all) just
for the default partition.
I don't really see why it should be needed there either. We've got
all the bounds in order, so we should know where there are any gaps
that are covered by the default partition in the range we care about.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/10/27 13:57, Robert Haas wrote:
On Fri, Oct 27, 2017 at 3:17 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I don't think we really want to get into theorem-proving here, because
it's slow.Just to be clear, I'm saying we could use theorem-proving (if at all) just
for the default partition.I don't really see why it should be needed there either. We've got
all the bounds in order, so we should know where there are any gaps
that are covered by the default partition in the range we care about.
Sorry, I forgot to add: "...just for the default partition, for cases like
the one in Beena's example."
In normal cases, default partition selection doesn't require any
theorem-proving. It proceeds in a straightforward manner more or less
like what you said it should.
After thinking more on it, I think there is a rather straightforward trick
to implement the idea you mentioned to get this working for the case
presented in Beena's example, which works as follows:
For any non-root partitioned tables, we add the list of its partition
constraint clauses to the query-provided list of clauses and use the whole
list to drive the partition-pruning algorithm. So, when partition-pruning
runs for tprt_1, along with (< 10000) which the original query provides,
we also have (>= 1) which comes from the partition constraint of tprt_1
(which is >= 1 and < 50000). Note that there exists a trick in the new
code for the (< 50000) coming from the constraint to be overridden by the
more restrictive (< 10000) coming from the original query.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 26, 2017 at 4:47 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp
wrote:
Meanwhile, attached updated set of patches including fixes for the typos
you reported in the other message. Updated 0005 fixes the first bug (the
Case 1 in your email), while other patches 0002-0004 are updated mostly to
fix the reported typos. A couple of tests are added in 0001 to test the
default partition case a bit more.
Hi Amit,
while testing further this feature, I got a bug with partitions as foreign
tables. Test case given below. Take a look.
CREATE EXTENSION postgres_fdw;
CREATE SERVER fp_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname
'postgres', port '5432', use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER fp_server;
CREATE TABLE fplt1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE fplt1_p1 (a int, b int, c text);
CREATE FOREIGN TABLE ftplt1_p1 PARTITION OF fplt1 FOR VALUES IN ('0000',
'0001', '0002', '0003') SERVER fp_server OPTIONS (TABLE_NAME 'fplt1_p1');
CREATE TABLE fplt1_p2 (a int, b int, c text);
CREATE FOREIGN TABLE ftplt1_p2 PARTITION OF fplt1 FOR VALUES IN ('0004',
'0005', '0006', '0007') SERVER fp_server OPTIONS (TABLE_NAME 'fplt1_p2');
INSERT INTO fplt1_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM
generate_series(0, 198, 2) i;
INSERT INTO fplt1_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM
generate_series(200, 398, 2) i;
--PG-HEAD
postgres=# EXPLAIN (COSTS OFF) SELECT t1.c FROM fplt1 t1, LATERAL (SELECT
DISTINCT t2.c FROM fplt1 t2 WHERE t2.c = t1.c ) q;
QUERY PLAN
--------------------------------------------------
Nested Loop
-> Append
-> Foreign Scan on ftplt1_p1 t1
-> Foreign Scan on ftplt1_p2 t1_1
-> Unique
-> Append
-> Foreign Scan on ftplt1_p1 t2
-> Foreign Scan on ftplt1_p2 t2_1
(8 rows)
--PG-HEAD +v5 patches
postgres=# EXPLAIN (COSTS OFF) SELECT t1.c FROM fplt1 t1, LATERAL (SELECT
DISTINCT t2.c FROM fplt1 t2 WHERE t2.c = t1.c ) q;
*ERROR: invalid expression for partition key*
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
Thanks Rajkumar for the test case.
On 2017/10/27 17:05, Rajkumar Raghuwanshi wrote:
while testing further this feature, I got a bug with partitions as foreign
tables. Test case given below. Take a look.
[ ... ]
--PG-HEAD
postgres=# EXPLAIN (COSTS OFF) SELECT t1.c FROM fplt1 t1, LATERAL (SELECT
DISTINCT t2.c FROM fplt1 t2 WHERE t2.c = t1.c ) q;
QUERY PLAN
--------------------------------------------------
Nested Loop
-> Append
-> Foreign Scan on ftplt1_p1 t1
-> Foreign Scan on ftplt1_p2 t1_1
-> Unique
-> Append
-> Foreign Scan on ftplt1_p1 t2
-> Foreign Scan on ftplt1_p2 t2_1
(8 rows)--PG-HEAD +v5 patches
postgres=# EXPLAIN (COSTS OFF) SELECT t1.c FROM fplt1 t1, LATERAL (SELECT
DISTINCT t2.c FROM fplt1 t2 WHERE t2.c = t1.c ) q;*ERROR: invalid expression for partition key*
I looked at this and it seems the error occurs not because partitions
being foreign tables, but because the new code is wrong to assume that
Param nodes can never appear in the clauses coming from baserestrictinfo.
When trying to do the plan-time pruning for the partitioned table
appearing inside the lateral subquery, there are Params in the clauses in
baserestrictinfo that the new pruning code was unprepared to handle.
Fixed the code to instead give up on plan-time pruning in such a case.
Attached updated set of patches. In addition to fixing the above bug, it
also fixes one of the cases reported by Beena regarding default partition
pruning that I yesterday had given up on as being too difficult to
implement [1]/messages/by-id/0d6096e8-7c7b-afed-71d3-dca151306626@lab.ntt.co.jp, but today found out is not that difficult to do [2]/messages/by-id/8499324c-8a33-4be7-9d23-7e6a95e60ddf@lab.ntt.co.jp.
Change summary:
0001: added some new tests
0002: no change
0003: fixed issue that Rajkumar reported (cope with Params properly)
0004: no change
0005: fix the case to prune the default partition when warranted (the
issue reported by Beena)
Thanks,
Amit
[1]: /messages/by-id/0d6096e8-7c7b-afed-71d3-dca151306626@lab.ntt.co.jp
/messages/by-id/0d6096e8-7c7b-afed-71d3-dca151306626@lab.ntt.co.jp
[2]: /messages/by-id/8499324c-8a33-4be7-9d23-7e6a95e60ddf@lab.ntt.co.jp
/messages/by-id/8499324c-8a33-4be7-9d23-7e6a95e60ddf@lab.ntt.co.jp
Attachments:
0001-Add-new-tests-for-partition-pruning-v6.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v6.patchDownload
From 575c88e4616b16d0a56bfc6f04bdf22c4976fa8b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/5] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 918 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 135 +++++
4 files changed, 1055 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..06e4b52632
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,918 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default
+ Filter: ((a)::numeric = '1'::numeric)
+(25 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 10)
+(7 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_2
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default
+ Filter: (a > 10)
+(21 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default
+ Filter: (a < 15)
+(7 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 15)
+(15 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(15 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text < 'ab'::text) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+(5 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp5_1
+ Filter: (a > 30)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default
+ Filter: (a > 30)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+-------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default
+ Filter: (a <= 31)
+(25 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(19 rows)
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 20) AND (a < 27))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 29;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a = 29)
+(3 rows)
+
+explain (costs off) select * from rlp where a >= 29;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_1
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default
+ Filter: (a >= 29)
+(9 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default
+ Filter: ((a > 1) AND (a >= 15))
+(21 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+explain select * from mc2p where a < 2;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p0 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p1 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p2 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+(9 rows)
+
+explain select * from mc2p where a = 2 and b < 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b < 1) AND (a = 2))
+(3 rows)
+
+explain select * from mc2p where a > 1;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p4 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p5 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+(9 rows)
+
+explain select * from mc2p where a = 1 and b > 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p2 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b > 1) AND (a = 1))
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..d79db585b8
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,135 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default;
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+explain (costs off) select * from rlp where a = 29;
+explain (costs off) select * from rlp where a >= 29;
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+
+explain select * from mc2p where a < 2;
+explain select * from mc2p where a = 2 and b < 1;
+explain select * from mc2p where a > 1;
+explain select * from mc2p where a = 1 and b > 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v6.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v6.patchDownload
From d5b7ba8b57284cc2a4faab595f58fba7765e6b62 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/5] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning, although
as of this commit this contains *all* appinfos.
5. Code to handle the possibility that a partition RelOptInfo may
not have the basic information set (set_append_rel_size() does
that normally, but for partitioned tables, it will only do it
for the *live* partitions, but partitionwise-join code would
look at *all* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, do not call
get_partitions_from_clauses() right away. Instead, store the clauses
(somewhere, such as in the Append plan node) until such a time as
when all the "constant" values in them will be available. As of this
commit, we only pick up clauses from the baserestrictinfo list, so
it's safe to assume that each of the matched clause will provide the
constant value needed for pruning.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 24 ++
src/backend/optimizer/path/allpaths.c | 694 ++++++++++++++++++++++++++++------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 90 +++++
src/include/catalog/partition.h | 5 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 776 insertions(+), 118 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 07fdf66c38..f8da91d0fe 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,30 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4e565b3c00..e4427b8a58 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,15 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
+static BoolExpr *process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop);
/*
@@ -834,6 +845,17 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->rows = clamp_row_est(rel->rows);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
/*
* set_foreign_pathlist
* Build access paths for a foreign table RTE
@@ -846,6 +868,488 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &constfalse);
+
+ /*
+ * Since the clauses in rel->baserestrictinfo should all contain Const
+ * operands, it should be possible to prune partitions right away.
+ */
+ if (partclauses != NIL && !constfalse)
+ {
+ get_partitions_from_clauses(parent, rel->relid, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's recorded in the
+ * PartClauseValSet as well.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ bool constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &constfalse1) != NIL)
+ result = lappend(result, clause);
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and that the latter is compatible with partitioning.
+ * If it is, we turn this into a OR BoolExpr: (key < val OR
+ * key > val), if the partitioning method supports such
+ * notion of inequality.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+
+ if (partscheme->strategy == PARTITION_STRATEGY_RANGE ||
+ partscheme->strategy == PARTITION_STRATEGY_LIST)
+ {
+ BoolExpr *ne_or;
+
+ ne_or = process_partition_ne_op(rel, negator,
+ partopfamily,
+ partcoll,
+ (Expr *) leftop,
+ (Expr *) rightop);
+ result = lappend(result, ne_or);
+ }
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * Check if saop_op is compatible with partitioning. If so and
+ * if this saop is of type 'key op ANY (...)', convert this into
+ * a OR BoolExpr.
+ */
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle its negator is indeed a part of the partitioning
+ * operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ negated = true;
+ }
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ result = lappend(result, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ result = lappend(result, nulltest);
+ }
+ }
+ }
+
+ return result;
+}
+
+static BoolExpr *
+process_partition_ne_op(RelOptInfo *rel,
+ Oid negator, Oid partopfamily, Oid partcoll,
+ Expr *leftop, Expr *rightop)
+{
+ Expr *ltexpr,
+ *gtexpr;
+ Oid ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily, lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+
+ return (BoolExpr *) makeBoolExpr(OR_EXPR, list_make2(ltexpr, gtexpr), -1);
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1364,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1378,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1415,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1428,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,73 +1438,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
- */
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1152,6 +1608,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1247,14 +1714,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1325,43 +1807,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1378,17 +1857,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d58635c887..24d800d8b7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6182,14 +6182,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..8e290e19b0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,82 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..5f55550952 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,9 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0003-Implement-get_partitions_from_clauses-v6.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v6.patchDownload
From 2e0d01f82c7cc8e04c923e0ce23c22809b859f22 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/5] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1080 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1076 insertions(+), 4 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f8da91d0fe..35e7d871ee 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ int rt_index, List *clauses);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool partkey_datum_from_expr(const Expr *expr, Datum *value);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1422,7 +1537,7 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
/*
- * get_partitions_using_clauses
+ * get_partitions_from_clauses
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
*
@@ -1439,15 +1554,972 @@ get_partitions_from_clauses(Relation relation, int rt_index,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ List *partconstr = RelationGetPartitionQual(relation);
+ PartitionSet *partset;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ partclauses = list_concat(partclauses, partconstr);
+ partset = get_partitions_from_clauses_guts(relation, rt_index,
+ partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
+
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_from_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionSet *partset;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset = partset_new(true, false);
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ /*
+ * If this orarg refutes the table's partition constraint (if the
+ * the table is a partition at all), don't go looking for its
+ * partitions, that is, leave the partition set we're building
+ * for this OR clause untouched.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partconstr = (List *) canonicalize_qual((Expr *) partconstr);
+ Assert(rt_index > 0);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+
+ /*
+ * NB: if the clause may contain Param, replace them with
+ * equivalent Vars before proceeding, because predtest.c does
+ * not know about Params.
+ */
+ if (predicate_refuted_by(partconstr,
+ list_make1(orarg), false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_guts(relation, 0,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else if (b->all_parts)
+ return a;
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(b) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0,
+ n_total = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ /* -1 represents an invalid value of NullTestType. */
+ memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc = palloc0(sizeof(PartClause));
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Does leftop match with this partition key column? */
+ if ((IsA(leftop, Var) && partattno != 0 &&
+ ((Var *) leftop)->varattno == partattno) ||
+ equal(leftop, partexpr))
+ {
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args);
+ Const *arrconst = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arrconst->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+ List *elem_exprs;
+ bool negated = false;
+
+ /*
+ * We would've accepted this saop only if its operator's
+ * negator was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /* Build clauses for the individual values in the array. */
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ elem_exprs = NIL;
+ for (i = 0; i < num_elems; i++)
+ {
+ Expr *elem_expr;
+
+ if (!elem_nulls[i])
+ {
+ Const *rightop;
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ rightop = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arrconst->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_expr = (Expr *) opexpr;
+ }
+ else
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_expr = (Expr *) nulltest;
+ }
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+
+ /* Build the OR clause and generate its PartClauseSetOr. */
+ if (saop->useOr)
+ {
+ BoolExpr *orexpr;
+
+ Assert(elem_exprs != NIL);
+ orexpr = (BoolExpr *) makeBoolExpr(OR_EXPR, elem_exprs,
+ -1);
+ *or_clauses = lappend(*or_clauses, orexpr);
+ }
+ else
+ /*
+ * To be ANDed with the clauses in the original list, just
+ * like what we do for the arguments of Boolean AND clause
+ * above.
+ */
+ clauses = list_concat(clauses, elem_exprs);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is admissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate keys. */
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ if (n_eqkeys + n_minkeys + n_maxkeys + n_keynullness > 0)
+ {
+ Datum value;
+ int n_datums_resolved;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_eqkeys; i++)
+ {
+ if (partkey_datum_from_expr(eqkey_exprs[i], &value))
+ {
+ keys->eqkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_eqkeys = n_datums_resolved;
+ n_total += keys->n_eqkeys;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_minkeys; i++)
+ {
+ if (partkey_datum_from_expr(minkey_exprs[i], &value))
+ {
+ keys->minkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_minkeys = n_datums_resolved;
+ n_total += keys->n_minkeys;
+ keys->min_incl = min_incl;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_maxkeys; i++)
+ {
+ if (partkey_datum_from_expr(maxkey_exprs[i], &value))
+ {
+ keys->maxkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_maxkeys = n_datums_resolved;
+ n_total += keys->n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+ n_total += n_keynullness;
+ }
+
+ return n_total;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(const Expr *expr, Datum *value)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ if (!partkey_datum_from_expr(leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v6.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v6.patchDownload
From eb76d1a20bbc002c9c994a3177b0eebef83f025e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 35e7d871ee..d9f12c5f42 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3519,12 +3554,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3536,6 +3574,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3566,12 +3605,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3768,12 +3808,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3795,11 +3835,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3807,17 +3847,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3828,12 +3886,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3847,20 +3906,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3873,8 +3931,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Implement-get_partitions_for_keys-v6.patchtext/plain; charset=UTF-8; name=0005-Implement-get_partitions_for_keys-v6.patchDownload
From 7f87307b03a1ce507909bc8e086e480532d55869 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 5/5] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 376 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 72 ++----
3 files changed, 398 insertions(+), 54 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d9f12c5f42..770a2a1ac9 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2551,7 +2551,381 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal;
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * Initialize the set as one that's neither empty nor contains all
+ * partitions. The code below will set min_part_idx and max_part_idx
+ * and/or other_parts as found out by comparing keys to the partition
+ * bounds, as well as considering special partitions like null-accepting
+ * and default partitions. If it turns out that no partitions need to
+ * be scanned, partset->empty will be set to true.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the null-accepting partition and the
+ * default partition can be holding null values at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+ return partset;
+ }
+ /* No bounding keys, so just return all partitions. */
+ else if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ {
+ partset->all_parts = true;
+ return partset;
+ }
+
+ /* Valid keys->eqkeys must provide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, must exactly match the datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts =
+ bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return partset;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * minkeys matched one of the datums (because, is_equal), but
+ * the query may have asked to exclude that value. If so,
+ * move to the bound on the right, which doesn't necessarily
+ * mean we're excluding the list partition containing that
+ * value, because there very well might be values in the range
+ * thus selected that belong to the partition to which the
+ * matched value (minkeys) also belongs.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ /* 1 more index than datums in this case */
+ maxoff = boundinfo->ndatums;
+ else
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* See the comment above for minkeys. */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff/maxoff set to -1 means none of the datums in PartitionBoundInfo
+ * satisfies minkeys/maxkeys. If both are set to a valid datum offset,
+ * that means there exists at least some datums (and hence partitions)
+ * satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Add to the other_parts, list partition indexes are not
+ * monotonously increasing like range partitions' are.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+
+ /*
+ * If the query doesn't specify either the lower or the upper
+ * bound, consider including the default partition in the
+ * result set, because the existing partitions may not cover
+ * all of the values that such an unbounded range contains.
+ *
+ * Also, if minoff != maxoff, there might be datums in that
+ * range that don't have a non-default partition assigned.
+ */
+ if (keys->n_minkeys == 0 || keys->n_maxkeys == 0 ||
+ minoff != maxoff)
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 06e4b52632..5ca3c45b6b 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -204,16 +204,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
--------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -465,11 +463,9 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default
Filter: (a <= 31)
-(25 rows)
+(23 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -511,9 +507,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -583,8 +577,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -593,7 +585,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -607,16 +599,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -634,9 +624,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -646,9 +634,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -662,9 +648,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -703,16 +687,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -724,9 +706,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -753,8 +733,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -762,9 +742,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -804,9 +782,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -844,9 +820,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -858,9 +832,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
-- a simpler multi-column keys case
create table mc2p (a int, b int) partition by range (a, b);
--
2.11.0
On Fri, Oct 27, 2017 at 2:41 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp
wrote:
0001: added some new tests
0002: no change
0003: fixed issue that Rajkumar reported (cope with Params properly)
0004: no change
0005: fix the case to prune the default partition when warranted (the
issue reported by Beena)
Thanks for the updated patch, i am getting server crash with below query.
CREATE TABLE mp (c1 int, c2 int, c3 int) PARTITION BY LIST(c3);
CREATE TABLE mp_p1 PARTITION OF mp FOR VALUES IN (10, 20) PARTITION BY
RANGE(c2);
CREATE TABLE mp_p1_1 PARTITION OF mp_p1 FOR VALUES FROM (0) TO (200);
CREATE TABLE mp_p1_2 PARTITION OF mp_p1 FOR VALUES FROM (200) TO (400);
CREATE TABLE mp_p2 PARTITION OF mp FOR VALUES IN (30, 40) PARTITION BY
RANGE(c2);
CREATE TABLE mp_p2_1 PARTITION OF mp_p2 FOR VALUES FROM (0) TO (300);
CREATE TABLE mp_p2_2 PARTITION OF mp_p2 FOR VALUES FROM (300) TO (600);
INSERT INTO mp VALUES(10, 100, 10);
INSERT INTO mp VALUES(20, 200, 20);
INSERT INTO mp VALUES(21, 150, 30);
INSERT INTO mp VALUES(30, 200, 40);
INSERT INTO mp VALUES(31, 300, 30);
INSERT INTO mp VALUES(40, 400, 40);
EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM mp WHERE c3 = 40 AND
c2 < 300;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
Thanks Rajkumar.
On 2017/10/27 19:29, Rajkumar Raghuwanshi wrote:
On Fri, Oct 27, 2017 at 2:41 PM, Amit Langote wrote:
0001: added some new tests
0002: no change
0003: fixed issue that Rajkumar reported (cope with Params properly)
0004: no change
0005: fix the case to prune the default partition when warranted (the
issue reported by Beena)Thanks for the updated patch, i am getting server crash with below query.
CREATE TABLE mp (c1 int, c2 int, c3 int) PARTITION BY LIST(c3);
CREATE TABLE mp_p1 PARTITION OF mp FOR VALUES IN (10, 20) PARTITION BY
RANGE(c2);
CREATE TABLE mp_p1_1 PARTITION OF mp_p1 FOR VALUES FROM (0) TO (200);
CREATE TABLE mp_p1_2 PARTITION OF mp_p1 FOR VALUES FROM (200) TO (400);
CREATE TABLE mp_p2 PARTITION OF mp FOR VALUES IN (30, 40) PARTITION BY
RANGE(c2);
CREATE TABLE mp_p2_1 PARTITION OF mp_p2 FOR VALUES FROM (0) TO (300);
CREATE TABLE mp_p2_2 PARTITION OF mp_p2 FOR VALUES FROM (300) TO (600);INSERT INTO mp VALUES(10, 100, 10);
INSERT INTO mp VALUES(20, 200, 20);
INSERT INTO mp VALUES(21, 150, 30);
INSERT INTO mp VALUES(30, 200, 40);
INSERT INTO mp VALUES(31, 300, 30);
INSERT INTO mp VALUES(40, 400, 40);EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM mp WHERE c3 = 40 AND
c2 < 300;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Looks like an update I'd included in the last posted patch (viz. add the
non-root partitioned tables' partition constraint clauses to the list of
clauses used for pruning) exposed a bug in how ScalarArrayOpExpr clauses
are being handled by the new pruning code. A partitioned list partition's
internal partition constraint clause contains ArrayExpr as the
ScalarArrayOpExpr's right-hand operand, whereas the pruning code thought
there could only ever be a Const holding an ArrayType value.
Fixed in the attached updated patch, along with a new test in 0001 to
cover this case. Also, made a few tweaks to 0003 and 0005 (moved some
code from the former to the latter) around the handling of ScalarArrayOpExprs.
Thanks,
Amit
Attachments:
0001-Add-new-tests-for-partition-pruning-v7.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v7.patchDownload
From 83ca3621117c76375b8d52af2f70af0fe5350ce6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/5] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 947 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 138 +++++
4 files changed, 1087 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..44f8713319
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,947 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_10
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_30
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a)::numeric = '1'::numeric)
+(29 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 10)
+(9 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_2
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_30
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 10)
+(23 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a < 15)
+(9 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 15)
+(17 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(17 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text < 'ab'::text) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+(5 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp_default_default
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp5_1
+ Filter: (a > 30)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 30)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+----------------------------------
+ Append
+ -> Seq Scan on rlp_default_30
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_30
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 31)
+(29 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_10
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(23 rows)
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 20) AND (a < 27))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 29;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a = 29)
+(3 rows)
+
+explain (costs off) select * from rlp where a >= 29;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_1
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_30
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_default
+ Filter: (a >= 29)
+(11 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default_10
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 1) AND (a >= 15))
+(23 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+explain select * from mc2p where a < 2;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p0 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p1 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p2 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+(9 rows)
+
+explain select * from mc2p where a = 2 and b < 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b < 1) AND (a = 2))
+(3 rows)
+
+explain select * from mc2p where a > 1;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p4 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p5 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+(9 rows)
+
+explain select * from mc2p where a = 1 and b > 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p2 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b > 1) AND (a = 1))
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..5fd8b0cd14
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,138 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+explain (costs off) select * from rlp where a = 29;
+explain (costs off) select * from rlp where a >= 29;
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+
+explain select * from mc2p where a < 2;
+explain select * from mc2p where a = 2 and b < 1;
+explain select * from mc2p where a > 1;
+explain select * from mc2p where a = 1 and b > 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v7.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v7.patchDownload
From a4c506ce64f0a78381ebcc317e1343a22f148323 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/5] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning (although
as of this commit this contains *all* appinfos as mentioned
below).
5. Some code in try_partition_wise_join in to handle the
possibility that a partition RelOptInfo may not have the basic
information set (note that as noted in 0, set_append_rel_size
now sets such information for only the *live* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, get_partitions_from_clauses
would returns without pruning any partitions. In most cases, it's
obvious in the planner that a set of clauses identified as matching
the partition key don't contain the constant values right away, in
which case, there is no need to call get_partitions_from_clauses
right away. Instead, it should be deferred to another piece of code
which can receive the above list of clauses and runs at a time when
the constant values become available.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 24 ++
src/backend/optimizer/path/allpaths.c | 566 +++++++++++++++++++++++++++-------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 90 ++++++
src/include/catalog/partition.h | 5 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 648 insertions(+), 118 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 66ec214e02..31c47d23e1 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,30 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4e565b3c00..aca372a0d2 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,12 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
/*
@@ -834,6 +842,17 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->rows = clamp_row_est(rel->rows);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
/*
* set_foreign_pathlist
* Build access paths for a foreign table RTE
@@ -846,6 +865,363 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &constfalse);
+
+ /*
+ * Since the clauses in rel->baserestrictinfo should all contain Const
+ * operands, it should be possible to prune partitions right away.
+ */
+ if (partclauses != NIL && !constfalse)
+ {
+ get_partitions_from_clauses(parent, rel->relid, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ bool constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &constfalse1) != NIL)
+ result = lappend(result, clause);
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and that the latter is compatible with partitioning.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ result = lappend(result, nulltest);
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1236,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1250,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1287,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1300,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,73 +1310,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1152,6 +1480,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1247,14 +1586,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1325,43 +1679,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1378,17 +1729,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d58635c887..24d800d8b7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6182,14 +6182,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..8e290e19b0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,82 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..5f55550952 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,9 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0003-Implement-get_partitions_from_clauses-v7.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v7.patchDownload
From 4a90dd60f7e31af25e6240b83dc121ff15463552 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/5] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1151 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1147 insertions(+), 4 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 31c47d23e1..41c30f9327 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ int rt_index, List *clauses);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool partkey_datum_from_expr(const Expr *expr, Datum *value);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1422,7 +1537,7 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
/*
- * get_partitions_using_clauses
+ * get_partitions_from_clauses
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
*
@@ -1439,15 +1554,1043 @@ get_partitions_from_clauses(Relation relation, int rt_index,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ List *partconstr = RelationGetPartitionQual(relation);
+ PartitionSet *partset;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ partclauses = list_concat(partclauses, partconstr);
+ partset = get_partitions_from_clauses_guts(relation, rt_index,
+ partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
+
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_from_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionSet *partset;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset = partset_new(true, false);
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ /*
+ * If this orarg refutes the table's partition constraint (if the
+ * the table is a partition at all), don't go looking for its
+ * partitions, that is, leave the partition set we're building
+ * for this OR clause untouched.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partconstr = (List *) canonicalize_qual((Expr *) partconstr);
+ Assert(rt_index > 0);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+
+ /*
+ * NB: if the clause may contain Param, replace them with
+ * equivalent Vars before proceeding, because predtest.c does
+ * not know about Params.
+ */
+ if (predicate_refuted_by(partconstr,
+ list_make1(orarg), false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_guts(relation, 0,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else if (b->all_parts)
+ return a;
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(b) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0,
+ n_total = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ /* -1 represents an invalid value of NullTestType. */
+ memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i],
+ partcoll = partkey->partcollation[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Skip if leftop doesn't match this partition key column. */
+ if ((!IsA(leftop, Var) ||
+ ((Var *) leftop)->varattno != partattno) &&
+ !equal(leftop, partexpr))
+ continue;
+
+ /*
+ * Deal with <> operators that the planner allows if it finds
+ * out that <>'s negator is indeed a valid partopfamily member.
+ * Make an equivalent OR expression and add to the *or_clauses
+ * list. That is, we convert a <> opclause into
+ * (leftop < rightop) OR (leftop > rightop).
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily) &&
+ (partkey->strategy == PARTITION_STRATEGY_RANGE ||
+ partkey->strategy == PARTITION_STRATEGY_LIST))
+ {
+ Expr *ltexpr,
+ *gtexpr;
+ Oid negator,
+ ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ negator = get_negator(opclause->opno);
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR,
+ list_make2(ltexpr, gtexpr),
+ -1));
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * Planner must have accepted this saop iff saop_op's negator
+ * was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is admissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate keys. */
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ if (n_eqkeys + n_minkeys + n_maxkeys + n_keynullness > 0)
+ {
+ Datum value;
+ int n_datums_resolved;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_eqkeys; i++)
+ {
+ if (partkey_datum_from_expr(eqkey_exprs[i], &value))
+ {
+ keys->eqkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_eqkeys = n_datums_resolved;
+ n_total += keys->n_eqkeys;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_minkeys; i++)
+ {
+ if (partkey_datum_from_expr(minkey_exprs[i], &value))
+ {
+ keys->minkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_minkeys = n_datums_resolved;
+ n_total += keys->n_minkeys;
+ keys->min_incl = min_incl;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_maxkeys; i++)
+ {
+ if (partkey_datum_from_expr(maxkey_exprs[i], &value))
+ {
+ keys->maxkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_maxkeys = n_datums_resolved;
+ n_total += keys->n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+ n_total += n_keynullness;
+ }
+
+ return n_total;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(const Expr *expr, Datum *value)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ if (!partkey_datum_from_expr(leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v7.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v7.patchDownload
From 6ec99b7dedbd348ba1969f602eb6737817fbff41 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 41c30f9327..caf52f4210 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3590,12 +3625,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3607,6 +3645,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3637,12 +3676,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3839,12 +3879,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3866,11 +3906,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3878,17 +3918,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3899,12 +3957,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3918,20 +3977,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3944,8 +4002,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Implement-get_partitions_for_keys-v7.patchtext/plain; charset=UTF-8; name=0005-Implement-get_partitions_for_keys-v7.patchDownload
From dec7931f2cca258a26d3d04afd3b89be333d5836 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 5/5] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 376 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 72 ++----
3 files changed, 398 insertions(+), 54 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index caf52f4210..6b663bb11f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2622,7 +2622,381 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal;
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * Initialize the set as one that's neither empty nor contains all
+ * partitions. The code below will set min_part_idx and max_part_idx
+ * and/or other_parts as found out by comparing keys to the partition
+ * bounds, as well as considering special partitions like null-accepting
+ * and default partitions. If it turns out that no partitions need to
+ * be scanned, partset->empty will be set to true.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the null-accepting partition and the
+ * default partition can be holding null values at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+ return partset;
+ }
+ /* No bounding keys, so just return all partitions. */
+ else if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ {
+ partset->all_parts = true;
+ return partset;
+ }
+
+ /* Valid keys->eqkeys must provide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, must exactly match the datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts =
+ bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return partset;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * minkeys matched one of the datums (because, is_equal), but
+ * the query may have asked to exclude that value. If so,
+ * move to the bound on the right, which doesn't necessarily
+ * mean we're excluding the list partition containing that
+ * value, because there very well might be values in the range
+ * thus selected that belong to the partition to which the
+ * matched value (minkeys) also belongs.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ /* 1 more index than datums in this case */
+ maxoff = boundinfo->ndatums;
+ else
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* See the comment above for minkeys. */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff/maxoff set to -1 means none of the datums in PartitionBoundInfo
+ * satisfies minkeys/maxkeys. If both are set to a valid datum offset,
+ * that means there exists at least some datums (and hence partitions)
+ * satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Add to the other_parts, list partition indexes are not
+ * monotonously increasing like range partitions' are.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+
+ /*
+ * If the query doesn't specify either the lower or the upper
+ * bound, consider including the default partition in the
+ * result set, because the existing partitions may not cover
+ * all of the values that such an unbounded range contains.
+ *
+ * Also, if minoff != maxoff, there might be datums in that
+ * range that don't have a non-default partition assigned.
+ */
+ if (keys->n_minkeys == 0 || keys->n_maxkeys == 0 ||
+ minoff != maxoff)
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 44f8713319..21267dad98 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -207,16 +207,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -482,15 +480,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -536,9 +532,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -612,8 +606,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -622,7 +614,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -636,16 +628,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -663,9 +653,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -675,9 +663,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -691,9 +677,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -732,16 +716,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -753,9 +735,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -782,8 +762,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -791,9 +771,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -833,9 +811,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -873,9 +849,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -887,9 +861,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
-- a simpler multi-column keys case
create table mc2p (a int, b int) partition by range (a, b);
--
2.11.0
On 2017/10/30 14:55, Amit Langote wrote:
Fixed in the attached updated patch, along with a new test in 0001 to
cover this case. Also, made a few tweaks to 0003 and 0005 (moved some
code from the former to the latter) around the handling of ScalarArrayOpExprs.
Sorry, I'd forgotten to include some changes.
In the previous versions, RT index of the table needed to be passed to
partition.c, which I realized is no longer needed, so I removed that
requirement from the interface. As a result, patches 0002 and 0003 have
changed in this version.
Thanks,
Amit
Attachments:
0001-Add-new-tests-for-partition-pruning-v8.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v8.patchDownload
From 83ca3621117c76375b8d52af2f70af0fe5350ce6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/5] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 947 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 138 +++++
4 files changed, 1087 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..44f8713319
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,947 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_10
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_30
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a)::numeric = '1'::numeric)
+(29 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 10)
+(9 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_2
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_30
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 10)
+(23 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a < 15)
+(9 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 15)
+(17 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(17 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text < 'ab'::text) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+(5 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp_default_default
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp5_1
+ Filter: (a > 30)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 30)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+----------------------------------
+ Append
+ -> Seq Scan on rlp_default_30
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_30
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 31)
+(29 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_10
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(23 rows)
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 20) AND (a < 27))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 29;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a = 29)
+(3 rows)
+
+explain (costs off) select * from rlp where a >= 29;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_1
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_30
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_default
+ Filter: (a >= 29)
+(11 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default_10
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 1) AND (a >= 15))
+(23 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+explain select * from mc2p where a < 2;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p0 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p1 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p2 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+(9 rows)
+
+explain select * from mc2p where a = 2 and b < 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b < 1) AND (a = 2))
+(3 rows)
+
+explain select * from mc2p where a > 1;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p4 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p5 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+(9 rows)
+
+explain select * from mc2p where a = 1 and b > 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p2 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b > 1) AND (a = 1))
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..5fd8b0cd14
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,138 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null; /* while we're on nulls */
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+explain (costs off) select * from rlp where a = 29;
+explain (costs off) select * from rlp where a >= 29;
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+
+explain select * from mc2p where a < 2;
+explain select * from mc2p where a = 2 and b < 1;
+explain select * from mc2p where a > 1;
+explain select * from mc2p where a = 1 and b > 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v8.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v8.patchDownload
From 521ddc6eaabf11b3cbbe7b0c8b02ddcf5bce60f2 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/5] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning (although
as of this commit this contains *all* appinfos as mentioned
below).
5. Some code in try_partition_wise_join in to handle the
possibility that a partition RelOptInfo may not have the basic
information set (note that as noted in 0, set_append_rel_size
now sets such information for only the *live* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, get_partitions_from_clauses
would returns without pruning any partitions. In most cases, it's
obvious in the planner that a set of clauses identified as matching
the partition key don't contain the constant values right away, in
which case, there is no need to call get_partitions_from_clauses
right away. Instead, it should be deferred to another piece of code
which can receive the above list of clauses and runs at a time when
the constant values become available.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 23 ++
src/backend/optimizer/path/allpaths.c | 566 +++++++++++++++++++++++++++-------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 90 ++++++
src/include/catalog/partition.h | 4 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 646 insertions(+), 118 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 66ec214e02..0ed5fbea48 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,29 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4e565b3c00..50f869448d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,12 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
/*
@@ -834,6 +842,17 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->rows = clamp_row_est(rel->rows);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
/*
* set_foreign_pathlist
* Build access paths for a foreign table RTE
@@ -846,6 +865,363 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &constfalse);
+
+ /*
+ * Since the clauses in rel->baserestrictinfo should all contain Const
+ * operands, it should be possible to prune partitions right away.
+ */
+ if (partclauses != NIL && !constfalse)
+ {
+ get_partitions_from_clauses(parent, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ bool constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &constfalse1) != NIL)
+ result = lappend(result, clause);
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and that the latter is compatible with partitioning.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ result = lappend(result, nulltest);
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1236,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1250,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1287,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1300,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,73 +1310,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1152,6 +1480,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1247,14 +1586,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1325,43 +1679,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1378,17 +1729,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d58635c887..24d800d8b7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6182,14 +6182,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..8e290e19b0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,82 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..7da99a9f41 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,8 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0003-Implement-get_partitions_from_clauses-v8.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v8.patchDownload
From 240f3da72b90be9927472b88080dfd6ce67fec04 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/5] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1124 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1120 insertions(+), 4 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 0ed5fbea48..a845288127 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ List *clauses);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool partkey_datum_from_expr(const Expr *expr, Datum *value);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1422,7 +1537,7 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
/*
- * get_partitions_using_clauses
+ * get_partitions_from_clauses
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
*
@@ -1438,15 +1553,1016 @@ get_partitions_from_clauses(Relation relation, List *partclauses,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ List *partconstr = RelationGetPartitionQual(relation);
+ PartitionSet *partset;
+
+ partclauses = list_concat(partclauses, partconstr);
+ partset = get_partitions_from_clauses_guts(relation, partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_from_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, List *clauses)
+{
+ PartitionSet *partset;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset;
+
+ arg_partset = get_partitions_from_clauses_guts(relation,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else if (b->all_parts)
+ return a;
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(b) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0,
+ n_total = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ /* -1 represents an invalid value of NullTestType. */
+ memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i],
+ partcoll = partkey->partcollation[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Skip if leftop doesn't match this partition key column. */
+ if ((!IsA(leftop, Var) ||
+ ((Var *) leftop)->varattno != partattno) &&
+ !equal(leftop, partexpr))
+ continue;
+
+ /*
+ * Deal with <> operators that the planner allows if it finds
+ * out that <>'s negator is indeed a valid partopfamily member.
+ * Make an equivalent OR expression and add to the *or_clauses
+ * list. That is, we convert a <> opclause into
+ * (leftop < rightop) OR (leftop > rightop).
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily) &&
+ (partkey->strategy == PARTITION_STRATEGY_RANGE ||
+ partkey->strategy == PARTITION_STRATEGY_LIST))
+ {
+ Expr *ltexpr,
+ *gtexpr;
+ Oid negator,
+ ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ negator = get_negator(opclause->opno);
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR,
+ list_make2(ltexpr, gtexpr),
+ -1));
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * Planner must have accepted this saop iff saop_op's negator
+ * was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is admissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate keys. */
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ if (n_eqkeys + n_minkeys + n_maxkeys + n_keynullness > 0)
+ {
+ Datum value;
+ int n_datums_resolved;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_eqkeys; i++)
+ {
+ if (partkey_datum_from_expr(eqkey_exprs[i], &value))
+ {
+ keys->eqkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_eqkeys = n_datums_resolved;
+ n_total += keys->n_eqkeys;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_minkeys; i++)
+ {
+ if (partkey_datum_from_expr(minkey_exprs[i], &value))
+ {
+ keys->minkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_minkeys = n_datums_resolved;
+ n_total += keys->n_minkeys;
+ keys->min_incl = min_incl;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_maxkeys; i++)
+ {
+ if (partkey_datum_from_expr(maxkey_exprs[i], &value))
+ {
+ keys->maxkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_maxkeys = n_datums_resolved;
+ n_total += keys->n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+ n_total += n_keynullness;
+ }
+
+ return n_total;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(const Expr *expr, Datum *value)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ if (!partkey_datum_from_expr(leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v8.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v8.patchDownload
From 3ece043a9644b6b10c86a074d996b9bfa78f9970 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index a845288127..0ab076b98c 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3562,12 +3597,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3579,6 +3617,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3609,12 +3648,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3811,12 +3851,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3838,11 +3878,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3850,17 +3890,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3871,12 +3929,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3890,20 +3949,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3916,8 +3974,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Implement-get_partitions_for_keys-v8.patchtext/plain; charset=UTF-8; name=0005-Implement-get_partitions_for_keys-v8.patchDownload
From 4c5a0e7e5efc023c68fbfeeaf325fb59f341f8b1 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 5/5] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 376 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 72 ++----
3 files changed, 398 insertions(+), 54 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 0ab076b98c..5cc304cca0 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2594,7 +2594,381 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal;
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * Initialize the set as one that's neither empty nor contains all
+ * partitions. The code below will set min_part_idx and max_part_idx
+ * and/or other_parts as found out by comparing keys to the partition
+ * bounds, as well as considering special partitions like null-accepting
+ * and default partitions. If it turns out that no partitions need to
+ * be scanned, partset->empty will be set to true.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the null-accepting partition and the
+ * default partition can be holding null values at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+ return partset;
+ }
+ /* No bounding keys, so just return all partitions. */
+ else if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ {
+ partset->all_parts = true;
+ return partset;
+ }
+
+ /* Valid keys->eqkeys must provide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, must exactly match the datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts =
+ bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return partset;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * minkeys matched one of the datums (because, is_equal), but
+ * the query may have asked to exclude that value. If so,
+ * move to the bound on the right, which doesn't necessarily
+ * mean we're excluding the list partition containing that
+ * value, because there very well might be values in the range
+ * thus selected that belong to the partition to which the
+ * matched value (minkeys) also belongs.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ /* 1 more index than datums in this case */
+ maxoff = boundinfo->ndatums;
+ else
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* See the comment above for minkeys. */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff/maxoff set to -1 means none of the datums in PartitionBoundInfo
+ * satisfies minkeys/maxkeys. If both are set to a valid datum offset,
+ * that means there exists at least some datums (and hence partitions)
+ * satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Add to the other_parts, list partition indexes are not
+ * monotonously increasing like range partitions' are.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+
+ /*
+ * If the query doesn't specify either the lower or the upper
+ * bound, consider including the default partition in the
+ * result set, because the existing partitions may not cover
+ * all of the values that such an unbounded range contains.
+ *
+ * Also, if minoff != maxoff, there might be datums in that
+ * range that don't have a non-default partition assigned.
+ */
+ if (keys->n_minkeys == 0 || keys->n_maxkeys == 0 ||
+ minoff != maxoff)
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 44f8713319..21267dad98 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -207,16 +207,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -482,15 +480,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -536,9 +532,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -612,8 +606,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -622,7 +614,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -636,16 +628,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -663,9 +653,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -675,9 +663,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -691,9 +677,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -732,16 +716,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -753,9 +735,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -782,8 +762,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -791,9 +771,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -833,9 +811,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -873,9 +849,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -887,9 +861,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
-- a simpler multi-column keys case
create table mc2p (a int, b int) partition by range (a, b);
--
2.11.0
On Mon, Oct 30, 2017 at 12:20 PM, Amit Langote <
Langote_Amit_f8@lab.ntt.co.jp> wrote:
In the previous versions, RT index of the table needed to be passed to
partition.c, which I realized is no longer needed, so I removed that
requirement from the interface. As a result, patches 0002 and 0003 have
changed in this version.
Thanks for the fix.
I am getting wrong output when default is sub-partitioned further, below is
a test case.
CREATE TABLE lpd(a int, b varchar, c float) PARTITION BY LIST (a);
CREATE TABLE lpd_p1 PARTITION OF lpd FOR VALUES IN (1,2,3);
CREATE TABLE lpd_p2 PARTITION OF lpd FOR VALUES IN (4,5);
CREATE TABLE lpd_d PARTITION OF lpd DEFAULT PARTITION BY LIST(a);
CREATE TABLE lpd_d1 PARTITION OF lpd_d FOR VALUES IN (7,8,9);
CREATE TABLE lpd_d2 PARTITION OF lpd_d FOR VALUES IN (10,11,12);
CREATE TABLE lpd_d3 PARTITION OF lpd_d FOR VALUES IN (6,null);
INSERT INTO lpd SELECT i,i,i FROM generate_Series (1,12)i;
INSERT INTO lpd VALUES (null,null,null);
--on HEAD
postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
a IS NOT NULL ORDER BY 1;
QUERY PLAN
---------------------------------------------
Sort
Sort Key: ((lpd_p1.tableoid)::regclass)
-> Result
-> Append
-> Seq Scan on lpd_p1
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_p2
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_d3
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_d1
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_d2
Filter: (a IS NOT NULL)
(14 rows)
postgres=#
postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
BY 1;
tableoid | a | b | c
----------+----+----+----
lpd_p1 | 1 | 1 | 1
lpd_p1 | 2 | 2 | 2
lpd_p1 | 3 | 3 | 3
lpd_p2 | 4 | 4 | 4
lpd_p2 | 5 | 5 | 5
lpd_d1 | 7 | 7 | 7
lpd_d1 | 8 | 8 | 8
lpd_d1 | 9 | 9 | 9
lpd_d2 | 12 | 12 | 12
lpd_d2 | 10 | 10 | 10
lpd_d2 | 11 | 11 | 11
lpd_d3 | 6 | 6 | 6
(12 rows)
--on HEAD + v8 patches
postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
a IS NOT NULL ORDER BY 1;
QUERY PLAN
---------------------------------------------
Sort
Sort Key: ((lpd_p1.tableoid)::regclass)
-> Result
-> Append
-> Seq Scan on lpd_p1
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_p2
Filter: (a IS NOT NULL)
(8 rows)
postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
BY 1;
tableoid | a | b | c
----------+---+---+---
lpd_p1 | 1 | 1 | 1
lpd_p1 | 2 | 2 | 2
lpd_p1 | 3 | 3 | 3
lpd_p2 | 4 | 4 | 4
lpd_p2 | 5 | 5 | 5
(5 rows)
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
On Mon, Oct 30, 2017 at 12:20 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/10/30 14:55, Amit Langote wrote:
Fixed in the attached updated patch, along with a new test in 0001 to
cover this case. Also, made a few tweaks to 0003 and 0005 (moved some
code from the former to the latter) around the handling of ScalarArrayOpExprs.Sorry, I'd forgotten to include some changes.
In the previous versions, RT index of the table needed to be passed to
partition.c, which I realized is no longer needed, so I removed that
requirement from the interface. As a result, patches 0002 and 0003 have
changed in this version.
Some Minor comments:
+ * get_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
Function name in function header is not correct.
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
DatumGetBool will return true if the first byte of constvalue is
nonzero otherwise
false. IIUC, this is not the intention here. Or I am missing something?
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 the list
a = 0 the list -> a = 0 in the list
+static bool
+partkey_datum_from_expr(const Expr *expr, Datum *value)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
I think for evaluating other expressions (e.g. T_Param) we will have
to pass ExprContext to this function. Or we can do something cleaner
because if we want to access the ExprContext inside
partkey_datum_from_expr then we may need to pass it to
"get_partitions_from_clauses" which is a common function for optimizer
and executor.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the test case.
On 2017/10/30 17:09, Rajkumar Raghuwanshi wrote:
I am getting wrong output when default is sub-partitioned further, below is
a test case.CREATE TABLE lpd(a int, b varchar, c float) PARTITION BY LIST (a);
CREATE TABLE lpd_p1 PARTITION OF lpd FOR VALUES IN (1,2,3);
CREATE TABLE lpd_p2 PARTITION OF lpd FOR VALUES IN (4,5);
CREATE TABLE lpd_d PARTITION OF lpd DEFAULT PARTITION BY LIST(a);
CREATE TABLE lpd_d1 PARTITION OF lpd_d FOR VALUES IN (7,8,9);
CREATE TABLE lpd_d2 PARTITION OF lpd_d FOR VALUES IN (10,11,12);
CREATE TABLE lpd_d3 PARTITION OF lpd_d FOR VALUES IN (6,null);
INSERT INTO lpd SELECT i,i,i FROM generate_Series (1,12)i;
INSERT INTO lpd VALUES (null,null,null);--on HEAD
postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
a IS NOT NULL ORDER BY 1;
QUERY PLAN
---------------------------------------------
Sort
Sort Key: ((lpd_p1.tableoid)::regclass)
-> Result
-> Append
-> Seq Scan on lpd_p1
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_p2
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_d3
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_d1
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_d2
Filter: (a IS NOT NULL)
(14 rows)postgres=#
postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
BY 1;
tableoid | a | b | c
----------+----+----+----
lpd_p1 | 1 | 1 | 1
lpd_p1 | 2 | 2 | 2
lpd_p1 | 3 | 3 | 3
lpd_p2 | 4 | 4 | 4
lpd_p2 | 5 | 5 | 5
lpd_d1 | 7 | 7 | 7
lpd_d1 | 8 | 8 | 8
lpd_d1 | 9 | 9 | 9
lpd_d2 | 12 | 12 | 12
lpd_d2 | 10 | 10 | 10
lpd_d2 | 11 | 11 | 11
lpd_d3 | 6 | 6 | 6
(12 rows)--on HEAD + v8 patches
postgres=# EXPLAIN (COSTS OFF) SELECT tableoid::regclass, * FROM lpd WHERE
a IS NOT NULL ORDER BY 1;
QUERY PLAN
---------------------------------------------
Sort
Sort Key: ((lpd_p1.tableoid)::regclass)
-> Result
-> Append
-> Seq Scan on lpd_p1
Filter: (a IS NOT NULL)
-> Seq Scan on lpd_p2
Filter: (a IS NOT NULL)
(8 rows)postgres=# SELECT tableoid::regclass, * FROM lpd WHERE a IS NOT NULL ORDER
BY 1;
tableoid | a | b | c
----------+---+---+---
lpd_p1 | 1 | 1 | 1
lpd_p1 | 2 | 2 | 2
lpd_p1 | 3 | 3 | 3
lpd_p2 | 4 | 4 | 4
lpd_p2 | 5 | 5 | 5
(5 rows)
I found bugs in 0003 and 0005 that caused this. Will post the patches
containing the fix in reply to the Dilip's email which contains some code
review comments [1]/messages/by-id/CAFiTN-thYsobXxPS6bwOA_9erpax_S=iztSn3RtUxKKMKG4V4A@mail.gmail.com.
Also, I noticed that the new pruning code was having a hard time do deal
with the fact that the default "range" partition doesn't explicitly say in
its partition constraint that it might contain null values. More
precisely perhaps, the default range partition's constraint appears to
imply that it can only contain non-null values, which confuses the new
pruning code.
Thanks,
Amit
[1]: /messages/by-id/CAFiTN-thYsobXxPS6bwOA_9erpax_S=iztSn3RtUxKKMKG4V4A@mail.gmail.com
/messages/by-id/CAFiTN-thYsobXxPS6bwOA_9erpax_S=iztSn3RtUxKKMKG4V4A@mail.gmail.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks Dilip for reviewing.
On 2017/10/31 1:50, Dilip Kumar wrote:
On Mon, Oct 30, 2017 at 12:20 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/10/30 14:55, Amit Langote wrote:
Fixed in the attached updated patch, along with a new test in 0001 to
cover this case. Also, made a few tweaks to 0003 and 0005 (moved some
code from the former to the latter) around the handling of ScalarArrayOpExprs.Sorry, I'd forgotten to include some changes.
In the previous versions, RT index of the table needed to be passed to
partition.c, which I realized is no longer needed, so I removed that
requirement from the interface. As a result, patches 0002 and 0003 have
changed in this version.Some Minor comments:
+ * get_rel_partitions + * Return the list of partitions of rel that pass the clauses mentioned + * rel->baserestrictinfo + * + * Returned list contains the AppendRelInfos of chosen partitions. + */ +static List * +get_append_rel_partitions(PlannerInfo *root,Function name in function header is not correct.
Fixed.
+ !DatumGetBool(((Const *) clause)->constvalue)) + { + *constfalse = true; + continue;DatumGetBool will return true if the first byte of constvalue is
nonzero otherwise
false. IIUC, this is not the intention here. Or I am missing something?
This coding pattern is in use in quite a few places; see for example in
restriction_is_constant_false() and many others like
relation_excluded_by_constraints(), negate_clause(), etc.
If a RestrictInfo is marked pseudoconstant=true, then the clause therein
must be a Const with constvalue computing to 0 if the clause is false, so
that DatumGetBool(constvalue) returns boolean false and non-zero otherwise.
+ * clauses in this function ourselves, for example, having both a > 1 and + * a = 0 the lista = 0 the list -> a = 0 in the list
Right, fixed.
+static bool +partkey_datum_from_expr(const Expr *expr, Datum *value) +{ + /* + * Add more expression types here as needed to support higher-level + * code. + */ + switch (nodeTag(expr)) + { + case T_Const: + *value = ((Const *) expr)->constvalue; + return true;I think for evaluating other expressions (e.g. T_Param) we will have
to pass ExprContext to this function.
That's right.
Or we can do something cleaner
because if we want to access the ExprContext inside
partkey_datum_from_expr then we may need to pass it to
"get_partitions_from_clauses" which is a common function for optimizer
and executor.
Yeah, I've thought about that a little. Since nothing else but the
planner calls it now and the planner doesn't itself have its hands on the
ExprContext that would be necessary for computing something like Params, I
left it out of the interface for now. That said, I *am* actually thinking
about some interface changes that would be necessary for some other
unrelated functionality/optimizations that callers like the run-time
pruning code would expect of get_partitions_from_clauses(). We can design
the interface extension such that the aforementioned ExprContext is passed
together.
Attached updated version of the patches addressing some of your comments
above and fixing a bug that Rajkumar reported [1]/messages/by-id/cd5a2d2e-0957-042c-40c2-06033fe0abf2@lab.ntt.co.jp. As mentioned there,
I'm including here a patch (the 0005 of the attached) to tweak the default
range partition constraint to be explicit about null values that it might
contain. So, there are 6 patches now and what used to be patch 0005 in
the previous set is patch 0006 in this version of the set.
Thanks,
Amit
[1]: /messages/by-id/cd5a2d2e-0957-042c-40c2-06033fe0abf2@lab.ntt.co.jp
/messages/by-id/cd5a2d2e-0957-042c-40c2-06033fe0abf2@lab.ntt.co.jp
Attachments:
0001-Add-new-tests-for-partition-pruning-v9.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v9.patchDownload
From 15b02441e0194f21af99878605846045449c7067 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/6] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 986 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 140 +++++
4 files changed, 1128 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..6c669ffdfc
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,986 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_10
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_30
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a)::numeric = '1'::numeric)
+(31 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 10)
+(9 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_2
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_30
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 10)
+(23 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a < 15)
+(9 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 15)
+(17 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(17 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text < 'ab'::text) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+(5 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on rlp_default_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a is not null;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3abcd
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3efgh
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_10
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_30
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_default
+ Filter: (a IS NOT NULL)
+(29 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp5_1
+ Filter: (a > 30)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 30)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+----------------------------------
+ Append
+ -> Seq Scan on rlp_default_30
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_30
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 31)
+(29 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_10
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_null
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(25 rows)
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 20) AND (a < 27))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 29;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a = 29)
+(3 rows)
+
+explain (costs off) select * from rlp where a >= 29;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_1
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_30
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_default
+ Filter: (a >= 29)
+(11 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default_10
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 1) AND (a >= 15))
+(23 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+explain select * from mc2p where a < 2;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p0 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p1 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p2 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+(9 rows)
+
+explain select * from mc2p where a = 2 and b < 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b < 1) AND (a = 2))
+(3 rows)
+
+explain select * from mc2p where a > 1;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p4 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p5 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+(9 rows)
+
+explain select * from mc2p where a = 1 and b > 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p2 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b > 1) AND (a = 1))
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..75e8a58f36
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,140 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null;
+explain (costs off) select * from rlp where a is not null;
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+explain (costs off) select * from rlp where a = 29;
+explain (costs off) select * from rlp where a >= 29;
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+
+explain select * from mc2p where a < 2;
+explain select * from mc2p where a = 2 and b < 1;
+explain select * from mc2p where a > 1;
+explain select * from mc2p where a = 1 and b > 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v9.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v9.patchDownload
From 5c8a0c598eae43654c51fbebd3329b0d502fac4b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/6] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning (although
as of this commit this contains *all* appinfos as mentioned
below).
5. Some code in try_partition_wise_join in to handle the
possibility that a partition RelOptInfo may not have the basic
information set (note that as noted in 0, set_append_rel_size
now sets such information for only the *live* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, get_partitions_from_clauses
would returns without pruning any partitions. In most cases, it's
obvious in the planner that a set of clauses identified as matching
the partition key don't contain the constant values right away, in
which case, there is no need to call get_partitions_from_clauses
right away. Instead, it should be deferred to another piece of code
which can receive the above list of clauses and runs at a time when
the constant values become available.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 23 ++
src/backend/optimizer/path/allpaths.c | 568 +++++++++++++++++++++++++++-------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 90 ++++++
src/include/catalog/partition.h | 4 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 648 insertions(+), 118 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5daa8a1c19..7b0e022865 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,29 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4e565b3c00..9e12e2b00e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,12 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
/*
@@ -834,6 +842,17 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->rows = clamp_row_est(rel->rows);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
/*
* set_foreign_pathlist
* Build access paths for a foreign table RTE
@@ -846,6 +865,365 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &constfalse);
+
+ /*
+ * Since the clauses in rel->baserestrictinfo should all contain Const
+ * operands, it should be possible to prune partitions right away.
+ */
+ if (partclauses != NIL && !constfalse)
+ {
+ get_partitions_from_clauses(parent, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ bool constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &constfalse1) != NIL)
+ result = lappend(result, clause);
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and that the latter is compatible with partitioning.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ result = lappend(result, nulltest);
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1238,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1252,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1289,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1302,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,73 +1312,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1152,6 +1482,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1247,14 +1588,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1325,43 +1681,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1378,17 +1731,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d58635c887..24d800d8b7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6182,14 +6182,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..8e290e19b0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,82 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..7da99a9f41 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,8 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0003-Implement-get_partitions_from_clauses-v9.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v9.patchDownload
From bb6c8c3634ccb1b82fd2031b992af636eac09bd5 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/6] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1126 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1122 insertions(+), 4 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 7b0e022865..d3a039cd78 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ List *clauses);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool partkey_datum_from_expr(const Expr *expr, Datum *value);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1422,7 +1537,7 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
/*
- * get_partitions_using_clauses
+ * get_partitions_from_clauses
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
*
@@ -1438,15 +1553,1018 @@ get_partitions_from_clauses(Relation relation, List *partclauses,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ List *partconstr = RelationGetPartitionQual(relation);
+ PartitionSet *partset;
+
+ if (partconstr)
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ partset = get_partitions_from_clauses_guts(relation, partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_from_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, List *clauses)
+{
+ PartitionSet *partset;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset;
+
+ arg_partset = get_partitions_from_clauses_guts(relation,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else if (b->all_parts)
+ return a;
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(b) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 in the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0,
+ n_total = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ /* -1 represents an invalid value of NullTestType. */
+ memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i],
+ partcoll = partkey->partcollation[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Skip if leftop doesn't match this partition key column. */
+ if ((!IsA(leftop, Var) ||
+ ((Var *) leftop)->varattno != partattno) &&
+ !equal(leftop, partexpr))
+ continue;
+
+ /*
+ * Deal with <> operators that the planner allows if it finds
+ * out that <>'s negator is indeed a valid partopfamily member.
+ * Make an equivalent OR expression and add to the *or_clauses
+ * list. That is, we convert a <> opclause into
+ * (leftop < rightop) OR (leftop > rightop).
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily) &&
+ (partkey->strategy == PARTITION_STRATEGY_RANGE ||
+ partkey->strategy == PARTITION_STRATEGY_LIST))
+ {
+ Expr *ltexpr,
+ *gtexpr;
+ Oid negator,
+ ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ negator = get_negator(opclause->opno);
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR,
+ list_make2(ltexpr, gtexpr),
+ -1));
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * Planner must have accepted this saop iff saop_op's negator
+ * was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is admissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate keys. */
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ if (n_eqkeys + n_minkeys + n_maxkeys + n_keynullness > 0)
+ {
+ Datum value;
+ int n_datums_resolved;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_eqkeys; i++)
+ {
+ if (partkey_datum_from_expr(eqkey_exprs[i], &value))
+ {
+ keys->eqkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_eqkeys = n_datums_resolved;
+ n_total += keys->n_eqkeys;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_minkeys; i++)
+ {
+ if (partkey_datum_from_expr(minkey_exprs[i], &value))
+ {
+ keys->minkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_minkeys = n_datums_resolved;
+ n_total += keys->n_minkeys;
+ keys->min_incl = min_incl;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_maxkeys; i++)
+ {
+ if (partkey_datum_from_expr(maxkey_exprs[i], &value))
+ {
+ keys->maxkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_maxkeys = n_datums_resolved;
+ n_total += keys->n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+ n_total += n_keynullness;
+ }
+
+ return n_total;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(const Expr *expr, Datum *value)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ if (!partkey_datum_from_expr(leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v9.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v9.patchDownload
From 90205c145922b0b0faf6f63452e9c0dfb4f63b94 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/6] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d3a039cd78..ca55d65c4b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3564,12 +3599,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3581,6 +3619,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3611,12 +3650,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3813,12 +3853,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3840,11 +3880,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3852,17 +3892,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3873,12 +3931,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3892,20 +3951,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3918,8 +3976,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Tweak-default-range-partition-s-constraint-a-little-v9.patchtext/plain; charset=UTF-8; name=0005-Tweak-default-range-partition-s-constraint-a-little-v9.patchDownload
From d6b496920133088a33d4a5bf417b220bd9d271a1 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 31 Oct 2017 16:26:55 +0900
Subject: [PATCH 5/6] Tweak default range partition's constraint a little
When using as a predicate, it's useful for it explicitly say that
the default range partition might contain nulls, because non-default
range partitions can't.
---
src/backend/catalog/partition.c | 29 +++++++++++++++++++++++------
src/test/regress/expected/update.out | 2 +-
2 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index ca55d65c4b..e9683efcaa 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -3116,12 +3116,29 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
if (or_expr_args != NIL)
{
- /* OR all the non-default partition constraints; then negate it */
- result = lappend(result,
- list_length(or_expr_args) > 1
- ? makeBoolExpr(OR_EXPR, or_expr_args, -1)
- : linitial(or_expr_args));
- result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
+ Expr *other_parts_constr;
+
+ /*
+ * Combine the constraints obtained for non-default partitions
+ * using OR. As requested, each of the OR's args doesn't include
+ * the NOT NULL test for partition keys (which is to avoid its
+ * useless repetition). Add the same now.
+ */
+ other_parts_constr =
+ makeBoolExpr(AND_EXPR,
+ lappend(get_range_nulltest(key),
+ list_length(or_expr_args) > 1
+ ? makeBoolExpr(OR_EXPR, or_expr_args,
+ -1)
+ : linitial(or_expr_args)),
+ -1);
+
+ /*
+ * Finally, the default partition contains everything *NOT*
+ * contained in the non-default partitions.
+ */
+ result = list_make1(makeBoolExpr(NOT_EXPR,
+ list_make1(other_parts_constr), -1));
}
return result;
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index cef70b1a1e..40217bdf9c 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -227,7 +227,7 @@ create table part_def partition of range_parted default;
a | text | | | | extended | |
b | integer | | | | plain | |
Partition of: range_parted DEFAULT
-Partition constraint: (NOT (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20))))
+Partition constraint: (NOT ((a IS NOT NULL) AND (b IS NOT NULL) AND (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20)))))
insert into range_parted values ('c', 9);
-- ok
--
2.11.0
0006-Implement-get_partitions_for_keys-v9.patchtext/plain; charset=UTF-8; name=0006-Implement-get_partitions_for_keys-v9.patchDownload
From f73f51400c6d02193d0c9a64508c90aa3a53137f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 6/6] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 379 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 72 ++----
3 files changed, 401 insertions(+), 54 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e9683efcaa..032e9bffc4 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2596,7 +2596,384 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal;
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * Initialize the set as one that's neither empty nor contains all
+ * partitions. The code below will set min_part_idx and max_part_idx
+ * and/or other_parts as found out by comparing keys to the partition
+ * bounds, as well as considering special partitions like null-accepting
+ * and default partitions. If it turns out that no partitions need to
+ * be scanned, partset->empty will be set to true.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the null-accepting partition and the
+ * default partition can be holding null values at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+ return partset;
+ }
+ /* No bounding keys, so just return all partitions. */
+ else if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ {
+ partset->all_parts = true;
+ return partset;
+ }
+
+ /* Valid keys->eqkeys must provide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, must exactly match the datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts =
+ bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return partset;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+
+ /*
+ * minoff set to -1 means all datums are greater than
+ * minkeys, which means all partitions satisfy minkeys.
+ */
+ if (minoff == -1)
+ minoff = 0;
+
+ /*
+ * minkeys matched one of the datums (because, is_equal), but
+ * the query may have asked to exclude that value. If so,
+ * move to the bound on the right, which doesn't necessarily
+ * mean we're excluding the list partition containing that
+ * value, because there very well might be values in the range
+ * thus selected that belong to the partition to which the
+ * matched value (minkeys) also belongs.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ /* 1 more index than datums in this case */
+ maxoff = boundinfo->ndatums;
+ else
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* See the comment above for minkeys. */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Add to the other_parts, list partition indexes are not
+ * monotonously increasing like range partitions' are.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list
+ * partition. Because list partitions divide the key space
+ * in a discontinuous manner, not all values in the given
+ * range will have a partition assigned.
+ */
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 6c669ffdfc..35bbb5da97 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -675,16 +667,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -702,9 +692,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -714,9 +702,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -730,9 +716,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -771,16 +755,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -792,9 +774,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -821,8 +801,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -830,9 +810,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -872,9 +850,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -912,9 +888,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -926,9 +900,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
-- a simpler multi-column keys case
create table mc2p (a int, b int) partition by range (a, b);
--
2.11.0
On 31 October 2017 at 21:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated version of the patches addressing some of your comments
above and fixing a bug that Rajkumar reported [1]. As mentioned there,
I'm including here a patch (the 0005 of the attached) to tweak the default
range partition constraint to be explicit about null values that it might
contain. So, there are 6 patches now and what used to be patch 0005 in
the previous set is patch 0006 in this version of the set.
Hi Amit,
I've been looking over this. I see the latest patches conflict with
cf7ab13bf. Can you send patches rebased on current master?
Thanks
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 31 October 2017 at 21:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated version of the patches addressing some of your comments
I've spent a bit of time looking at these but I'm out of time for now.
So far I have noted down the following;
1. This comment seem wrong.
/*
* Since the clauses in rel->baserestrictinfo should all contain Const
* operands, it should be possible to prune partitions right away.
*/
How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ?
baserestrictinfo in this case will contain a single RestrictInfo with
an OpExpr containing two Var args and it'll come right through that
function too.
2. This code is way more complex than it needs to be.
if (num_parts > 0)
{
int j;
all_indexes = (int *) palloc(num_parts * sizeof(int));
j = 0;
if (min_part_idx >= 0 && max_part_idx >= 0)
{
for (i = min_part_idx; i <= max_part_idx; i++)
all_indexes[j++] = i;
}
if (!bms_is_empty(other_parts))
while ((i = bms_first_member(other_parts)) >= 0)
all_indexes[j++] = i;
if (j > 1)
qsort((void *) all_indexes, j, sizeof(int), intcmp);
}
It looks like the min/max partition stuff is just complicating things
here. If you need to build this array of all_indexes[] anyway, I don't
quite understand the point of the min/max. It seems like min/max would
probably work a bit nicer if you didn't need the other_parts
BitmapSet, so I recommend just getting rid of min/max completely and
just have a BitmapSet with bit set for each partition's index you
need, you'd not need to go to the trouble of performing a qsort on an
array and you could get rid of quite a chunk of code too.
The entire function would then not be much more complex than:
partindexes = get_partitions_from_clauses(parent, partclauses);
while ((i = bms_first_member(partindexes)) >= 0)
{
AppendRelInfo *appinfo = rel->part_appinfos[i];
result = lappend(result, appinfo);
}
Then you can also get rid of your intcmp() function too.
3. Following code has the wrong size calculation:
memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *));
should be PARTITION_MAX_KEYS * sizeof(NullTestType). It might have
worked on your machine if you're compiling as 32 bit.
I'll continue on with the review in the next few days.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3 November 2017 at 17:32, David Rowley <david.rowley@2ndquadrant.com> wrote:
2. This code is way more complex than it needs to be.
if (num_parts > 0)
{
int j;all_indexes = (int *) palloc(num_parts * sizeof(int));
j = 0;
if (min_part_idx >= 0 && max_part_idx >= 0)
{
for (i = min_part_idx; i <= max_part_idx; i++)
all_indexes[j++] = i;
}
if (!bms_is_empty(other_parts))
while ((i = bms_first_member(other_parts)) >= 0)
all_indexes[j++] = i;
if (j > 1)
qsort((void *) all_indexes, j, sizeof(int), intcmp);
}It looks like the min/max partition stuff is just complicating things
here. If you need to build this array of all_indexes[] anyway, I don't
quite understand the point of the min/max. It seems like min/max would
probably work a bit nicer if you didn't need the other_parts
BitmapSet, so I recommend just getting rid of min/max completely and
just have a BitmapSet with bit set for each partition's index you
need, you'd not need to go to the trouble of performing a qsort on an
array and you could get rid of quite a chunk of code too.The entire function would then not be much more complex than:
partindexes = get_partitions_from_clauses(parent, partclauses);
while ((i = bms_first_member(partindexes)) >= 0)
{
AppendRelInfo *appinfo = rel->part_appinfos[i];
result = lappend(result, appinfo);
}Then you can also get rid of your intcmp() function too.
I've read a bit more of the patch and I think even more now that the
min/max stuff should be removed.
I understand that you'll be bsearching for a lower and upper bound for
cases like:
SELECT * FROM pt WHERE key BETWEEN 10 and 20;
but it looks like the min and max range stuff is thrown away if the
query is written as:
SELECT * FROM pt WHERE key BETWEEN 10 and 20 OR key BETWEEN 30 AND 40;
from reading the code, it seems like partset_union() would be called
in this case and if the min/max of each were consecutive then the
min/max range would get merged, but there's really a lot of code to
support this. I think it would be much better to invent
bms_add_range() and just use a Bitmapset to store the partition
indexes to scan. You could simply use bms_union for OR cases and
bms_intersect() or AND cases. It seems this would allow removal of
this complex code. It looks like this would allow you to remove all
the partset_range_* macros too.
I've attached a patch which implements bms_add_range() which will save
you from having to write the tight loops to call bms_add_member() such
as the ones in partset_union(). Those would not be so great if there
was a huge number of partitions as the Bitmapset->words[] array could
be expanded many more times than required. bms_add_range() will handle
that much more efficiently with a maximum of 1 repalloc() for the
whole operation. It would also likely faster since it's working at the
bitmapword level rather than bit level.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Add-bms_add_range-to-add-members-within-the-specifie.patchapplication/octet-stream; name=0001-Add-bms_add_range-to-add-members-within-the-specifie.patchDownload
From f32760f730048b35c1045f005a1b2c9898a759e6 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 6 Nov 2017 14:39:33 +1300
Subject: [PATCH] Add bms_add_range to add members within the specified range
The same behavior could be obtained by looping and using bms_add_member,
however, using bms_add_range operates at the bitmapword level and should
be far faster when the range is large.
Author: David Rowley
---
src/backend/nodes/bitmapset.c | 74 +++++++++++++++++++++++++++++++++++++++++++
src/include/nodes/bitmapset.h | 1 +
2 files changed, 75 insertions(+)
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index bf8545d..34d242b 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -785,6 +785,80 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
}
/*
+ * bms_add_range
+ * Add members in the range of 'lower' to 'upper' to the set.
+ *
+ * Note this could also be done by calling bms_add_member in a loop, however,
+ * using this function will be faster when the range is large as we work with
+ * at the bitmapword level rather than at bit level.
+ */
+Bitmapset *
+bms_add_range(Bitmapset *a, int lower, int upper)
+{
+ int lwordnum,
+ uwordnum,
+ wordnum;
+
+ if (lower < 0 || upper < 0)
+ elog(ERROR, "negative bitmapset member not allowed");
+ if (lower > upper)
+ elog(ERROR, "lower range must not be above upper range");
+ uwordnum = WORDNUM(upper);
+
+ if (a == NULL)
+ {
+ a = (Bitmapset *) palloc0(BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ }
+
+ /* ensure we have enough words to store the upper bit */
+ else if (uwordnum >= a->nwords)
+ {
+ int oldnwords = a->nwords;
+ int i;
+
+ a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ /* zero out the enlarged portion */
+ for (i = oldnwords; i < a->nwords; i++)
+ a->words[i] = 0;
+ }
+
+ wordnum = lwordnum = WORDNUM(lower);
+
+ /*
+ * Starting at lower's wordnum, loop over each word up to upper's wordnum.
+ * Along the way set all bits inclusively between lower and upper to 1. We
+ * only need to handle the lwordnum and uwordnum specially so we don't set
+ * any bits outside of the range.
+ */
+ while (wordnum <= uwordnum)
+ {
+ bitmapword mask = (bitmapword) ~0;
+
+ /* If working on the lower word, zero out bits below 'lower'. */
+ if (wordnum == lwordnum)
+ {
+ int lbitnum = BITNUM(lower);
+ mask >>= lbitnum;
+ mask <<= lbitnum;
+ }
+
+ /* Likewise, if working on the upper word, zero bits above 'upper' */
+ if (wordnum == uwordnum)
+ {
+ int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1);
+ mask <<= ushiftbits;
+ mask >>= ushiftbits;
+ }
+
+ a->words[wordnum++] |= mask;
+ }
+
+ return a;
+}
+
+/*
* bms_int_members - like bms_intersect, but left input is recycled
*/
Bitmapset *
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index aa3fb25..3b62a97 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -90,6 +90,7 @@ extern bool bms_is_empty(const Bitmapset *a);
extern Bitmapset *bms_add_member(Bitmapset *a, int x);
extern Bitmapset *bms_del_member(Bitmapset *a, int x);
extern Bitmapset *bms_add_members(Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_add_range(Bitmapset *a, int lower, int upper);
extern Bitmapset *bms_int_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_del_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_join(Bitmapset *a, Bitmapset *b);
--
1.9.5.msysgit.1
On 31 October 2017 at 21:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated version of the patches
match_clauses_to_partkey() needs to allow for the way quals on Bool
columns are represented.
create table pt (a bool not null) partition by list (a);
create table pt_true partition of pt for values in('t');
create table pt_false partition of pt for values in('f');
explain select * from pt where a = true;
QUERY PLAN
------------------------------------------------------------------
Append (cost=0.00..76.20 rows=2810 width=1)
-> Seq Scan on pt_false (cost=0.00..38.10 rows=1405 width=1)
Filter: a
-> Seq Scan on pt_true (cost=0.00..38.10 rows=1405 width=1)
Filter: a
(5 rows)
match_clause_to_indexcol() shows an example of how to handle this.
explain select * from pt where a = false;
will need to be allowed too. This works slightly differently.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi David,
On 2017/11/03 13:32, David Rowley wrote:
On 31 October 2017 at 21:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated version of the patches addressing some of your comments
I've spent a bit of time looking at these but I'm out of time for now.
Thanks a lot for the review and sorry for the delay in sending rebased
patches.
So far I have noted down the following;
1. This comment seem wrong.
/*
* Since the clauses in rel->baserestrictinfo should all contain Const
* operands, it should be possible to prune partitions right away.
*/
Yes. I used to think it was true, then realized it isn't and updated the
code to get rid of that assumption, but I forgot updating this comment.
Fixed.
How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ?
baserestrictinfo in this case will contain a single RestrictInfo with
an OpExpr containing two Var args and it'll come right through that
function too.
As it should I think. Quite similarly, you will be able see that index
path won't be considered for such a clause:
create table foo (a int, b int);
create index fooi on foo (a);
insert into foo select generate_series(1, 100000);
explain select * from foo where a = 1;
QUERY PLAN
----------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=12.17..482.50 rows=500 width=8)
Recheck Cond: (a = 1)
-> Bitmap Index Scan on fooi (cost=0.00..12.04 rows=500 width=0)
Index Cond: (a = 1)
(4 rows)
explain select * from foo where a <= b;
QUERY PLAN
--------------------------------------------------------
Seq Scan on foo (cost=0.00..1693.00 rows=500 width=8)
Filter: (a = b)
(2 rows)
We won't be able to use such a clause for pruning at all; neither
planning-time pruning nor execution-time pruning. Am I missing something?
2. This code is way more complex than it needs to be.
if (num_parts > 0)
{
int j;all_indexes = (int *) palloc(num_parts * sizeof(int));
j = 0;
if (min_part_idx >= 0 && max_part_idx >= 0)
{
for (i = min_part_idx; i <= max_part_idx; i++)
all_indexes[j++] = i;
}
if (!bms_is_empty(other_parts))
while ((i = bms_first_member(other_parts)) >= 0)
all_indexes[j++] = i;
if (j > 1)
qsort((void *) all_indexes, j, sizeof(int), intcmp);
}It looks like the min/max partition stuff is just complicating things
here. If you need to build this array of all_indexes[] anyway, I don't
quite understand the point of the min/max. It seems like min/max would
probably work a bit nicer if you didn't need the other_parts
BitmapSet, so I recommend just getting rid of min/max completely and
just have a BitmapSet with bit set for each partition's index you
need, you'd not need to go to the trouble of performing a qsort on an
array and you could get rid of quite a chunk of code too.The entire function would then not be much more complex than:
partindexes = get_partitions_from_clauses(parent, partclauses);
while ((i = bms_first_member(partindexes)) >= 0)
{
AppendRelInfo *appinfo = rel->part_appinfos[i];
result = lappend(result, appinfo);
}Then you can also get rid of your intcmp() function too.
The design with min/max partition index interface to the partition.c's new
partition-pruning facility is intentional. You can find hints about how
such a design came about in the following Robert's email:
/messages/by-id/CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com
For range queries, it is desirable for the partitioning module to return
the set of qualifying partitions that are contiguous in a compact (O(1))
min/max representation than having to enumerate all those indexes in the
set. It's nice to avoid iterating over that set twice -- once when
constructing the set in the partitioning module and then again in the
caller (in this case, planner) to perform the planning-related tasks per
selected partition.
We need the other_parts Bitmapset too, because selected partitions may not
always be contiguous, even in the case of range partitioning, if there are
OR clauses and the possibility that the default partition is also
selected. While computing the selected partition set from a given set of
clauses, partitioning code tries to keep the min/max representation as
long as it makes sense to and once the selected partitions no longer
appear to be contiguous it switches to the Bitmapset mode.
3. Following code has the wrong size calculation:
memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *));
should be PARTITION_MAX_KEYS * sizeof(NullTestType). It might have
worked on your machine if you're compiling as 32 bit.
Oops, you're right. Fixed.
I'll continue on with the review in the next few days.
Thanks again.
Attached is the updated set of patches.
Regards,
Amit
Attachments:
0001-Add-new-tests-for-partition-pruning-v10.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v10.patchDownload
From e13d6eda12d95dff7a11aed16841948c427f19af Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/6] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 986 ++++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 140 +++++
4 files changed, 1128 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..6c669ffdfc
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,986 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_10
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_30
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a)::numeric = '1'::numeric)
+(31 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 10)
+(9 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_2
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_30
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 10)
+(23 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a < 15)
+(9 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 15)
+(17 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(17 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text < 'ab'::text) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+(5 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on rlp_default_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a is not null;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3abcd
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3efgh
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_10
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_30
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_default
+ Filter: (a IS NOT NULL)
+(29 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp5_1
+ Filter: (a > 30)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 30)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+----------------------------------
+ Append
+ -> Seq Scan on rlp_default_30
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_30
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 31)
+(29 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_10
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_null
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(25 rows)
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 20) AND (a < 27))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 29;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a = 29)
+(3 rows)
+
+explain (costs off) select * from rlp where a >= 29;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_1
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_30
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_default
+ Filter: (a >= 29)
+(11 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default_10
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 1) AND (a >= 15))
+(23 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+explain select * from mc2p where a < 2;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p0 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p1 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p2 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a < 2)
+(9 rows)
+
+explain select * from mc2p where a = 2 and b < 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b < 1) AND (a = 2))
+(3 rows)
+
+explain select * from mc2p where a > 1;
+ QUERY PLAN
+---------------------------------------------------------------------
+ Append (cost=0.00..153.00 rows=3012 width=8)
+ -> Seq Scan on mc2p3 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p4 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p5 (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+ -> Seq Scan on mc2p_default (cost=0.00..38.25 rows=753 width=8)
+ Filter: (a > 1)
+(9 rows)
+
+explain select * from mc2p where a = 1 and b > 1;
+ QUERY PLAN
+------------------------------------------------------------
+ Append (cost=0.00..43.90 rows=4 width=8)
+ -> Seq Scan on mc2p2 (cost=0.00..43.90 rows=4 width=8)
+ Filter: ((b > 1) AND (a = 1))
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..75e8a58f36
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,140 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null;
+explain (costs off) select * from rlp where a is not null;
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+explain (costs off) select * from rlp where a = 29;
+explain (costs off) select * from rlp where a >= 29;
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+
+explain select * from mc2p where a < 2;
+explain select * from mc2p where a = 2 and b < 1;
+explain select * from mc2p where a > 1;
+explain select * from mc2p where a = 1 and b > 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p;
--
2.11.0
0002-Planner-side-changes-for-partition-pruning-v10.patchtext/plain; charset=UTF-8; name=0002-Planner-side-changes-for-partition-pruning-v10.patchDownload
From 88ec0ff42aa9359d89ffb738c854dedc2db0da74 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 2/6] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning (although
as of this commit this contains *all* appinfos as mentioned
below).
5. Some code in try_partition_wise_join in to handle the
possibility that a partition RelOptInfo may not have the basic
information set (note that as noted in 0, set_append_rel_size
now sets such information for only the *live* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, get_partitions_from_clauses
would returns without pruning any partitions. In most cases, it's
obvious in the planner that a set of clauses identified as matching
the partition key don't contain the constant values right away, in
which case, there is no need to call get_partitions_from_clauses
right away. Instead, it should be deferred to another piece of code
which can receive the above list of clauses and runs at a time when
the constant values become available.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 23 ++
src/backend/optimizer/path/allpaths.c | 603 +++++++++++++++++++++++++++-------
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 101 ++++++
src/include/catalog/partition.h | 4 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
9 files changed, 682 insertions(+), 130 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5daa8a1c19..7b0e022865 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,29 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * *min_part_idx and *max_part_idx constitutes a range of contiguous
+ * indexes of partitions satisfying the query, while *other_parts
+ * contains indexes of partitions that satisfy the query but are
+ * not included in the aforementioned range
+ */
+void
+get_partitions_from_clauses(Relation relation, List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a6efb4e1d3..452ecb9b03 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,11 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +137,13 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -845,6 +854,398 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
rel->fdwroutine->GetForeignPaths(root, rel, rte->relid);
}
+static int
+intcmp(const void *va, const void *vb)
+{
+ int a = *((const int *) va);
+ int b = *((const int *) vb);
+
+ if (a == b)
+ return 0;
+ return (a > b) ? 1 : -1;
+}
+
+/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i,
+ num_parts = 0,
+ min_part_idx = -1,
+ max_part_idx = -1,
+ *all_indexes = NULL;
+ Bitmapset *other_parts = NULL;
+ bool contains_const,
+ constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &contains_const,
+ &constfalse);
+
+ /*
+ * If the matched clauses contains at least some constant operands, use
+ * the same to prune partitions right away.
+ */
+ if (partclauses != NIL && contains_const && !constfalse)
+ {
+ get_partitions_from_clauses(parent, partclauses,
+ &min_part_idx, &max_part_idx,
+ &other_parts);
+ /* Get *all* indexes in one place and sort. */
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ num_parts += (max_part_idx - min_part_idx + 1);
+ if (!bms_is_empty(other_parts))
+ num_parts += bms_num_members(other_parts);
+
+ if (num_parts > 0)
+ {
+ int j;
+
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ j = 0;
+ if (min_part_idx >= 0 && max_part_idx >= 0)
+ {
+ for (i = min_part_idx; i <= max_part_idx; i++)
+ all_indexes[j++] = i;
+ }
+ if (!bms_is_empty(other_parts))
+ while ((i = bms_first_member(other_parts)) >= 0)
+ all_indexes[j++] = i;
+ if (j > 1)
+ qsort((void *) all_indexes, j, sizeof(int), intcmp);
+ }
+ }
+ else if (!constfalse)
+ {
+ /* No clauses to prune paritions, so scan all partitions. */
+ num_parts = partdesc->nparts;
+ all_indexes = (int *) palloc(num_parts * sizeof(int));
+ for (i = 0; i < partdesc->nparts; i++)
+ all_indexes[i] = i;
+ }
+
+ /* Fetch the partition appinfos. */
+ for (i = 0; i < num_parts; i++)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[all_indexes[i]];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[all_indexes[i]] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ if (all_indexes)
+ pfree(all_indexes);
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ *
+ * If clauses contains at least one constant operand or a Nullness test,
+ * *contains_const is set so that the caller can pass the clauses to the
+ * partitioning module right away.
+ *
+ * If the list contains a pseudo-constant RestrictInfo with constant false
+ * value, *constfalse is set.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ bool contains_const1,
+ constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ result = lappend(result, clause);
+ *contains_const = contains_const1;
+ }
+ else if (and_clause((Node *) clause))
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and that the latter is compatible with partitioning.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ }
+ }
+
+ return result;
+}
+
/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
@@ -860,6 +1261,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1275,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1312,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1325,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,85 +1335,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1164,6 +1505,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1259,14 +1611,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1337,43 +1704,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1390,17 +1754,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d58635c887..24d800d8b7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6182,14 +6182,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..b06696b7f0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..7da99a9f41 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,8 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+void get_partitions_from_clauses(Relation relation, List *partclauses,
+ int *min_part_idx, int *max_part_idx,
+ Bitmapset **other_parts);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0003-Implement-get_partitions_from_clauses-v10.patchtext/plain; charset=UTF-8; name=0003-Implement-get_partitions_from_clauses-v10.patchDownload
From b4420e9b7e07a794cdea87cf4e5a9f1702bd86dc Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 3/6] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1126 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 1122 insertions(+), 4 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 7b0e022865..e2c4e1fbcb 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -37,6 +37,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +113,100 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
+ /* A data structure to represent a partition set. */
+typedef struct PartitionSet
+{
+ /*
+ * If either empty or all_parts is true, values of the other fields are
+ * invalid.
+ */
+ bool empty; /* contains no partitions */
+ bool all_parts; /* contains all partitions */
+
+ /*
+ * In the case of range partitioning, min_part_index contains the index of
+ * the lowest partition contained in the set and max_datum_index that of
+ * the highest partition (all partitions between these two indexes
+ * inclusive are part of the set.) Since other types of partitioning do
+ * not impose order on the data contained in successive partitions, these
+ * fields are not set in that case.
+ */
+ bool use_range;
+ int min_part_idx;
+ int max_part_idx;
+
+ /*
+ * other_parts contains the indexes of partitions that are not covered by
+ * the range defined by min/max indexes. For example, in the case of
+ * range partitoning, it will include default partition index (if any).
+ * Also, this is the only way to return list partitions, because list
+ * partitions do not have the same ordering property as range partitions,
+ * so it's pointless to use the min/max range method.
+ */
+ Bitmapset *other_parts;
+} PartitionSet;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +246,25 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static PartitionSet *get_partitions_from_clauses_guts(Relation relation,
+ List *clauses);
+static PartitionSet *partset_copy(const PartitionSet *in);
+static PartitionSet *partset_intersect(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_union(PartitionSet *a, const PartitionSet *b);
+static PartitionSet *partset_new(bool empty, bool all_parts);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool partkey_datum_from_expr(const Expr *expr, Datum *value);
+static PartitionSet *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1422,7 +1537,7 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
/*
- * get_partitions_using_clauses
+ * get_partitions_from_clauses
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
*
@@ -1438,15 +1553,1018 @@ get_partitions_from_clauses(Relation relation, List *partclauses,
Bitmapset **other_parts)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ List *partconstr = RelationGetPartitionQual(relation);
+ PartitionSet *partset;
+
+ if (partconstr)
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ partset = get_partitions_from_clauses_guts(relation, partclauses);
+ if (partset->empty)
+ {
+ *min_part_idx = *max_part_idx = -1;
+ *other_parts = NULL;
+ }
+ else if (partset->all_parts)
+ {
+ *min_part_idx = 0;
+ *max_part_idx = partdesc->nparts - 1;
+ *other_parts = NULL;
+ }
+ else
+ {
+ if (partset->use_range)
+ {
+ *min_part_idx = partset->min_part_idx;
+ *max_part_idx = partset->max_part_idx;
+ }
+ else
+ *min_part_idx = *max_part_idx = -1;
- *min_part_idx = 0;
- *max_part_idx = partdesc->nparts - 1;
- *other_parts = NULL;
+ *other_parts = partset->other_parts;
+ }
}
/* Module-local functions */
/*
+ * get_partitions_from_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list (return value describes the set of such partitions)
+ *
+ * rt_index is the table's range table position needed to set varno of Vars
+ * contained in the table's partition constraint that is used in certain
+ * cases.
+ */
+static PartitionSet *
+get_partitions_from_clauses_guts(Relation relation, List *clauses)
+{
+ PartitionSet *partset;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ if (constfalse)
+ /* None of the partitions will satisfy the clauses. */
+ partset = partset_new(true, false);
+ else if (nkeys > 0)
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ partset = get_partitions_for_keys(relation, &keys);
+ else
+ /* No constraints on the keys, so, return *all* partitions. */
+ partset = partset_new(false, true);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ PartitionSet *or_partset = partset_new(true, false);
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ PartitionSet *arg_partset;
+
+ arg_partset = get_partitions_from_clauses_guts(relation,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = partset_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ partset = partset_intersect(partset, or_partset);
+ }
+
+ return partset;
+}
+
+/* Partition set manipulation functions. */
+
+static PartitionSet *
+partset_new(bool empty, bool all_parts)
+{
+ PartitionSet *result = palloc0(sizeof(PartitionSet));
+
+ result->empty = empty;
+ result->all_parts = all_parts;
+ /*
+ * Remains true until we explicitly turn it off in partset_union in a
+ * certain case.
+ */
+ result->use_range = true;
+ result->min_part_idx = result->max_part_idx = -1;
+ result->other_parts = NULL;
+
+ return result;
+}
+
+static PartitionSet *
+partset_copy(const PartitionSet *in)
+{
+ PartitionSet *result;
+
+ if (in == NULL)
+ return NULL;
+
+ result = partset_new(in->empty, in->all_parts);
+ result->min_part_idx = in->min_part_idx;
+ result->max_part_idx = in->max_part_idx;
+ result->other_parts = in->other_parts; /* not bms_copy. */
+
+ return result;
+}
+
+/*
+ * Macros to manipulate the range of partitions specified in a given
+ * PartitionSet (s) using its min_part_idx and max_part_idx fields, which are
+ * both inclusive ends of the range.
+ */
+
+#define partset_range_empty(s)\
+ ((s)->min_part_idx < 0 && (s)->max_part_idx < 0)
+
+#define partset_range_overlap(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->min_part_idx >= (s2)->min_part_idx &&\
+ (s1)->min_part_idx <= (s2)->max_part_idx) ||\
+ ((s2)->min_part_idx >= (s1)->min_part_idx &&\
+ (s2)->min_part_idx <= (s1)->max_part_idx)))
+
+#define partset_range_adjacent(s1, s2)\
+ (!partset_range_empty((s1)) && !partset_range_empty((s2)) &&\
+ (((s1)->max_part_idx == (s2)->min_part_idx) || \
+ ((s2)->max_part_idx == (s1)->min_part_idx)))
+
+/* The result after intersection is stuffed back into 'a'. */
+static PartitionSet *
+partset_intersect(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->all_parts || b->empty)
+ a = partset_copy(b);
+ else if (b->all_parts)
+ return a;
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+
+ /*
+ * If one or both sets' range is empty, or if they don't overlap,
+ * then the result's range is empty.
+ */
+ if (partset_range_empty(a) ||
+ partset_range_empty(b) ||
+ !partset_range_overlap(a, b))
+ {
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ else
+ {
+ a->min_part_idx = Max(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Min(a->max_part_idx, b->max_part_idx);
+ }
+
+ a->other_parts = bms_intersect(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/* The result after union is stuffed back into 'a'. */
+static PartitionSet *
+partset_union(PartitionSet *a, const PartitionSet *b)
+{
+ Assert(a != NULL && b != NULL);
+
+ if (a->empty || b->all_parts)
+ a = partset_copy(b);
+ else
+ {
+ /*
+ * Partition set is specified by min_part_idx, max_part_idx and/or
+ * other_parts, so make the result set using those fields.
+ */
+ int i;
+
+ /*
+ * Combine b's range into a's only if we're still using the range
+ * representation.
+ */
+ if (a->use_range)
+ {
+ if(!partset_range_empty(a) && !partset_range_empty(b))
+ {
+ /*
+ * Unify into one range using range union only if it makes
+ * sense, that is only if they are adjacent to or overlap with
+ * each other. If not, unify them by adding indexes within
+ * both ranges to the other_parts bitmap and mark the set as
+ * no longer using the range representation, because, the
+ * indexes in this no longer have the property of being
+ * contiguous.
+ */
+ if (partset_range_overlap(a, b) ||
+ partset_range_adjacent(a, b))
+ {
+ a->min_part_idx = Min(a->min_part_idx, b->min_part_idx);
+ a->max_part_idx = Max(a->max_part_idx, b->max_part_idx);
+ }
+ else
+ {
+ for (i = a->min_part_idx; i <= a->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+
+ /* The set is no longer to be represented as range. */
+ a->use_range = false;
+ a->min_part_idx = a->max_part_idx = -1;
+ }
+ }
+ else if (partset_range_empty(a))
+ {
+ a->min_part_idx = b->min_part_idx;
+ a->max_part_idx = b->max_part_idx;
+ }
+ }
+ else
+ {
+ if (!partset_range_empty(b))
+ {
+ for (i = b->min_part_idx; i <= b->max_part_idx; i++)
+ a->other_parts = bms_add_member(a->other_parts, i);
+ }
+ }
+
+ a->other_parts = bms_union(a->other_parts, b->other_parts);
+ }
+
+ return a;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 in the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_bool_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0,
+ n_total = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ /* -1 represents an invalid value of NullTestType. */
+ memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ *or_clauses = lappend(*or_clauses, clause);
+ else
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i],
+ partcoll = partkey->partcollation[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Skip if leftop doesn't match this partition key column. */
+ if ((!IsA(leftop, Var) ||
+ ((Var *) leftop)->varattno != partattno) &&
+ !equal(leftop, partexpr))
+ continue;
+
+ /*
+ * Deal with <> operators that the planner allows if it finds
+ * out that <>'s negator is indeed a valid partopfamily member.
+ * Make an equivalent OR expression and add to the *or_clauses
+ * list. That is, we convert a <> opclause into
+ * (leftop < rightop) OR (leftop > rightop).
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily) &&
+ (partkey->strategy == PARTITION_STRATEGY_RANGE ||
+ partkey->strategy == PARTITION_STRATEGY_LIST))
+ {
+ Expr *ltexpr,
+ *gtexpr;
+ Oid negator,
+ ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ negator = get_negator(opclause->opno);
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR,
+ list_make2(ltexpr, gtexpr),
+ -1));
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * Planner must have accepted this saop iff saop_op's negator
+ * was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ }
+ only_bool_clauses = false;
+ }
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_bool_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is admissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate keys. */
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ if (n_eqkeys + n_minkeys + n_maxkeys + n_keynullness > 0)
+ {
+ Datum value;
+ int n_datums_resolved;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_eqkeys; i++)
+ {
+ if (partkey_datum_from_expr(eqkey_exprs[i], &value))
+ {
+ keys->eqkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_eqkeys = n_datums_resolved;
+ n_total += keys->n_eqkeys;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_minkeys; i++)
+ {
+ if (partkey_datum_from_expr(minkey_exprs[i], &value))
+ {
+ keys->minkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_minkeys = n_datums_resolved;
+ n_total += keys->n_minkeys;
+ keys->min_incl = min_incl;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_maxkeys; i++)
+ {
+ if (partkey_datum_from_expr(maxkey_exprs[i], &value))
+ {
+ keys->maxkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_maxkeys = n_datums_resolved;
+ n_total += keys->n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+ n_total += n_keynullness;
+ }
+
+ return n_total;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(const Expr *expr, Datum *value)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ Assert(IsA(leftarg->constarg, Const) &&
+ IsA(rightarg->constarg, Const));
+ if (!partkey_datum_from_expr(leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static PartitionSet *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ return partset_new(false, true);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
--
2.11.0
0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v10.patchtext/plain; charset=UTF-8; name=0004-Some-interface-changes-for-partition_bound_-cmp-bsea-v10.patchDownload
From 701e973f885534273d1509a5e14c23e2061dd198 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 4/6] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e2c4e1fbcb..3ebedec335 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -207,6 +207,30 @@ typedef struct PartitionSet
Bitmapset *other_parts;
} PartitionSet;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -235,14 +259,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -938,10 +963,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -992,6 +1023,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1013,8 +1045,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1028,9 +1063,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3564,12 +3599,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3581,6 +3619,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3611,12 +3650,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3813,12 +3853,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3840,11 +3880,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3852,17 +3892,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3873,12 +3931,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3892,20 +3951,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3918,8 +3976,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0005-Tweak-default-range-partition-s-constraint-a-little-v10.patchtext/plain; charset=UTF-8; name=0005-Tweak-default-range-partition-s-constraint-a-little-v10.patchDownload
From b5d331a9483ec61784b3bf22a3ec0aa9afc4dedd Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 31 Oct 2017 16:26:55 +0900
Subject: [PATCH 5/6] Tweak default range partition's constraint a little
When using as a predicate, it's useful for it explicitly say that
the default range partition might contain nulls, because non-default
range partitions can't.
---
src/backend/catalog/partition.c | 29 +++++++++++++++++++++++------
src/test/regress/expected/update.out | 2 +-
2 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 3ebedec335..54fed3b5bc 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -3116,12 +3116,29 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
if (or_expr_args != NIL)
{
- /* OR all the non-default partition constraints; then negate it */
- result = lappend(result,
- list_length(or_expr_args) > 1
- ? makeBoolExpr(OR_EXPR, or_expr_args, -1)
- : linitial(or_expr_args));
- result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
+ Expr *other_parts_constr;
+
+ /*
+ * Combine the constraints obtained for non-default partitions
+ * using OR. As requested, each of the OR's args doesn't include
+ * the NOT NULL test for partition keys (which is to avoid its
+ * useless repetition). Add the same now.
+ */
+ other_parts_constr =
+ makeBoolExpr(AND_EXPR,
+ lappend(get_range_nulltest(key),
+ list_length(or_expr_args) > 1
+ ? makeBoolExpr(OR_EXPR, or_expr_args,
+ -1)
+ : linitial(or_expr_args)),
+ -1);
+
+ /*
+ * Finally, the default partition contains everything *NOT*
+ * contained in the non-default partitions.
+ */
+ result = list_make1(makeBoolExpr(NOT_EXPR,
+ list_make1(other_parts_constr), -1));
}
return result;
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index cef70b1a1e..40217bdf9c 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -227,7 +227,7 @@ create table part_def partition of range_parted default;
a | text | | | | extended | |
b | integer | | | | plain | |
Partition of: range_parted DEFAULT
-Partition constraint: (NOT (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20))))
+Partition constraint: (NOT ((a IS NOT NULL) AND (b IS NOT NULL) AND (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20)))))
insert into range_parted values ('c', 9);
-- ok
--
2.11.0
0006-Implement-get_partitions_for_keys-v10.patchtext/plain; charset=UTF-8; name=0006-Implement-get_partitions_for_keys-v10.patchDownload
From 3eee5ab229e4704499fe80471e7baa4aed10dc9a Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 6/6] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 379 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 72 ++----
3 files changed, 401 insertions(+), 54 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 54fed3b5bc..152ae60a6b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2596,7 +2596,384 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static PartitionSet *
get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
{
- return partset_new(false, true);
+ PartitionSet *partset;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ bool is_equal;
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return partset_new(true, false);
+
+ /*
+ * Initialize the set as one that's neither empty nor contains all
+ * partitions. The code below will set min_part_idx and max_part_idx
+ * and/or other_parts as found out by comparing keys to the partition
+ * bounds, as well as considering special partitions like null-accepting
+ * and default partitions. If it turns out that no partitions need to
+ * be scanned, partset->empty will be set to true.
+ */
+ partset = partset_new(false, false);
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the null-accepting partition and the
+ * default partition can be holding null values at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ partset->other_parts = bms_make_singleton(other_idx);
+
+ return partset;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+ return partset;
+ }
+ /* No bounding keys, so just return all partitions. */
+ else if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ {
+ partset->all_parts = true;
+ return partset;
+ }
+
+ /* Valid keys->eqkeys must provide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, must exactly match the datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ partset->other_parts =
+ bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts =
+ bms_make_singleton(boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return partset;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+
+ /*
+ * minoff set to -1 means all datums are greater than
+ * minkeys, which means all partitions satisfy minkeys.
+ */
+ if (minoff == -1)
+ minoff = 0;
+
+ /*
+ * minkeys matched one of the datums (because, is_equal), but
+ * the query may have asked to exclude that value. If so,
+ * move to the bound on the right, which doesn't necessarily
+ * mean we're excluding the list partition containing that
+ * value, because there very well might be values in the range
+ * thus selected that belong to the partition to which the
+ * matched value (minkeys) also belongs.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ /* 1 more index than datums in this case */
+ maxoff = boundinfo->ndatums;
+ else
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* See the comment above for minkeys. */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Add to the other_parts, list partition indexes are not
+ * monotonously increasing like range partitions' are.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ partset->other_parts =
+ bms_add_member(partset->other_parts,
+ boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list
+ * partition. Because list partitions divide the key space
+ * in a discontinuous manner, not all values in the given
+ * range will have a partition assigned.
+ */
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ partset->min_part_idx = boundinfo->indexes[minoff];
+ partset->max_part_idx = boundinfo->indexes[maxoff];
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ partset->other_parts = bms_add_member(partset->other_parts,
+ boundinfo->default_index);
+ else
+ partset->empty = true;
+
+ return partset;
}
/*
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 6c669ffdfc..35bbb5da97 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -675,16 +667,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -702,9 +692,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -714,9 +702,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -730,9 +716,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -771,16 +755,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -792,9 +774,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -821,8 +801,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -830,9 +810,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -872,9 +850,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -912,9 +888,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -926,9 +900,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
-- a simpler multi-column keys case
create table mc2p (a int, b int) partition by range (a, b);
--
2.11.0
On 2017/11/06 12:53, David Rowley wrote:
On 3 November 2017 at 17:32, David Rowley <david.rowley@2ndquadrant.com> wrote:
2. This code is way more complex than it needs to be.
if (num_parts > 0)
{
int j;all_indexes = (int *) palloc(num_parts * sizeof(int));
j = 0;
if (min_part_idx >= 0 && max_part_idx >= 0)
{
for (i = min_part_idx; i <= max_part_idx; i++)
all_indexes[j++] = i;
}
if (!bms_is_empty(other_parts))
while ((i = bms_first_member(other_parts)) >= 0)
all_indexes[j++] = i;
if (j > 1)
qsort((void *) all_indexes, j, sizeof(int), intcmp);
}It looks like the min/max partition stuff is just complicating things
here. If you need to build this array of all_indexes[] anyway, I don't
quite understand the point of the min/max. It seems like min/max would
probably work a bit nicer if you didn't need the other_parts
BitmapSet, so I recommend just getting rid of min/max completely and
just have a BitmapSet with bit set for each partition's index you
need, you'd not need to go to the trouble of performing a qsort on an
array and you could get rid of quite a chunk of code too.The entire function would then not be much more complex than:
partindexes = get_partitions_from_clauses(parent, partclauses);
while ((i = bms_first_member(partindexes)) >= 0)
{
AppendRelInfo *appinfo = rel->part_appinfos[i];
result = lappend(result, appinfo);
}Then you can also get rid of your intcmp() function too.
I've read a bit more of the patch and I think even more now that the
min/max stuff should be removed.
Oops, I didn't catch this email before sending my earlier reply. Thanks
for the bms range patch. Will reply to this shortly after reading your
patch and thinking on it a bit.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6 November 2017 at 17:30, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/11/03 13:32, David Rowley wrote:
On 31 October 2017 at 21:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
1. This comment seem wrong./*
* Since the clauses in rel->baserestrictinfo should all contain Const
* operands, it should be possible to prune partitions right away.
*/Yes. I used to think it was true, then realized it isn't and updated the
code to get rid of that assumption, but I forgot updating this comment.
Fixed.How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ?
baserestrictinfo in this case will contain a single RestrictInfo with
an OpExpr containing two Var args and it'll come right through that
function too.
...
We won't be able to use such a clause for pruning at all; neither
planning-time pruning nor execution-time pruning. Am I missing something?
I just meant the comment was wrong.
The design with min/max partition index interface to the partition.c's new
partition-pruning facility is intentional. You can find hints about how
such a design came about in the following Robert's email:/messages/by-id/CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com
Yeah, I remember reading that before I had looked at the code. I
disagree with Robert on this. The fact that the min/max range gets
turned into a list of everything in that range in
get_append_rel_partitions means all the advantages that storing the
partitions as a range is voided. If you could have kept it a range the
entire time, then that might be different, but seems you need to
materialize the entire range in order to sort the partitions into
order. I've included Robert in just in case he wants to take a look at
the code that resulted from that design. Maybe something is not
following what he had in mind, or maybe he'll change his mind based on
the code that resulted.
For range queries, it is desirable for the partitioning module to return
the set of qualifying partitions that are contiguous in a compact (O(1))
min/max representation than having to enumerate all those indexes in the
set. It's nice to avoid iterating over that set twice -- once when
constructing the set in the partitioning module and then again in the
caller (in this case, planner) to perform the planning-related tasks per
selected partition.
The idea is that you still get the min and max from the bsearch, but
then use bms_add_range() to populate a bitmapset of the matching
partitions. The big-O notation of the search shouldn't change.
We need the other_parts Bitmapset too, because selected partitions may not
always be contiguous, even in the case of range partitioning, if there are
OR clauses and the possibility that the default partition is also
selected. While computing the selected partition set from a given set of
clauses, partitioning code tries to keep the min/max representation as
long as it makes sense to and once the selected partitions no longer
appear to be contiguous it switches to the Bitmapset mode.
Yip. I understand that. I just think there's no benefit to having
min/max since it needs to be materialized into a list of the entire
range at some point, it might as well be done as soon as possible
using a bitmapset, which would save having all the partset_union,
partset_intersect, partset_range_empty, partset_range_overlap,
partset_range_adjacent code. You'd end up just using bms_union and
bms_intersect then bms_add_range to handle the min/max bound you get
from the bsearch.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/11/06 13:15, David Rowley wrote:
On 31 October 2017 at 21:43, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated version of the patches
match_clauses_to_partkey() needs to allow for the way quals on Bool
columns are represented.create table pt (a bool not null) partition by list (a);
create table pt_true partition of pt for values in('t');
create table pt_false partition of pt for values in('f');
explain select * from pt where a = true;
QUERY PLAN
------------------------------------------------------------------
Append (cost=0.00..76.20 rows=2810 width=1)
-> Seq Scan on pt_false (cost=0.00..38.10 rows=1405 width=1)
Filter: a
-> Seq Scan on pt_true (cost=0.00..38.10 rows=1405 width=1)
Filter: a
(5 rows)match_clause_to_indexcol() shows an example of how to handle this.
explain select * from pt where a = false;
will need to be allowed too. This works slightly differently.
You're right. I've fixed things to handle Boolean partitioning in the
updated set of patches I will post shortly.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017/11/06 14:32, David Rowley wrote:
On 6 November 2017 at 17:30, Amit Langote wrote:
On 2017/11/03 13:32, David Rowley wrote:
On 31 October 2017 at 21:43, Amit Langote wrote:
1. This comment seem wrong./*
* Since the clauses in rel->baserestrictinfo should all contain Const
* operands, it should be possible to prune partitions right away.
*/Yes. I used to think it was true, then realized it isn't and updated the
code to get rid of that assumption, but I forgot updating this comment.
Fixed.How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ?
baserestrictinfo in this case will contain a single RestrictInfo with
an OpExpr containing two Var args and it'll come right through that
function too....
We won't be able to use such a clause for pruning at all; neither
planning-time pruning nor execution-time pruning. Am I missing something?I just meant the comment was wrong.
Ah, gotcha.
The design with min/max partition index interface to the partition.c's new
partition-pruning facility is intentional. You can find hints about how
such a design came about in the following Robert's email:/messages/by-id/CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com
Yeah, I remember reading that before I had looked at the code. I
disagree with Robert on this. The fact that the min/max range gets
turned into a list of everything in that range in
get_append_rel_partitions means all the advantages that storing the
partitions as a range is voided. If you could have kept it a range the
entire time, then that might be different, but seems you need to
materialize the entire range in order to sort the partitions into
order. I've included Robert in just in case he wants to take a look at
the code that resulted from that design. Maybe something is not
following what he had in mind, or maybe he'll change his mind based on
the code that resulted.For range queries, it is desirable for the partitioning module to return
the set of qualifying partitions that are contiguous in a compact (O(1))
min/max representation than having to enumerate all those indexes in the
set. It's nice to avoid iterating over that set twice -- once when
constructing the set in the partitioning module and then again in the
caller (in this case, planner) to perform the planning-related tasks per
selected partition.The idea is that you still get the min and max from the bsearch, but
then use bms_add_range() to populate a bitmapset of the matching
partitions. The big-O notation of the search shouldn't change.We need the other_parts Bitmapset too, because selected partitions may not
always be contiguous, even in the case of range partitioning, if there are
OR clauses and the possibility that the default partition is also
selected. While computing the selected partition set from a given set of
clauses, partitioning code tries to keep the min/max representation as
long as it makes sense to and once the selected partitions no longer
appear to be contiguous it switches to the Bitmapset mode.Yip. I understand that. I just think there's no benefit to having
min/max since it needs to be materialized into a list of the entire
range at some point, it might as well be done as soon as possible
using a bitmapset, which would save having all the partset_union,
partset_intersect, partset_range_empty, partset_range_overlap,
partset_range_adjacent code. You'd end up just using bms_union and
bms_intersect then bms_add_range to handle the min/max bound you get
from the bsearch.
OK, I have gotten rid of the min/max partition index interface and instead
adopted the bms_add_range() approach by including your patch to add the
same in the patch set (which is now 0002 in the whole set). I have to
admit that it's simpler to understand the new code with just Bitmapsets to
look at, but I'm still a bit concerned about materializing the whole set
right within partition.c, although we can perhaps optimize it later.
Attached updated set of patches, including the fix to make the new pruning
code handle Boolean partitioning.
Thanks,
Amit
Attachments:
0002-Add-bms_add_range-to-add-members-within-the-specifie-v11.patchtext/plain; charset=UTF-8; name=0002-Add-bms_add_range-to-add-members-within-the-specifie-v11.patchDownload
From cba48504d99e8149b6598dbb011f8e1c5a46738a Mon Sep 17 00:00:00 2001
From: David Rowley <dgrowley@gmail.com>
Date: Mon, 6 Nov 2017 14:39:33 +1300
Subject: [PATCH 2/7] Add bms_add_range to add members within the specified
range
The same behavior could be obtained by looping and using bms_add_member,
however, using bms_add_range operates at the bitmapword level and should
be far faster when the range is large.
Author: David Rowley
---
src/backend/nodes/bitmapset.c | 74 +++++++++++++++++++++++++++++++++++++++++++
src/include/nodes/bitmapset.h | 1 +
2 files changed, 75 insertions(+)
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index bf8545d437..34d242b357 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -785,6 +785,80 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
}
/*
+ * bms_add_range
+ * Add members in the range of 'lower' to 'upper' to the set.
+ *
+ * Note this could also be done by calling bms_add_member in a loop, however,
+ * using this function will be faster when the range is large as we work with
+ * at the bitmapword level rather than at bit level.
+ */
+Bitmapset *
+bms_add_range(Bitmapset *a, int lower, int upper)
+{
+ int lwordnum,
+ uwordnum,
+ wordnum;
+
+ if (lower < 0 || upper < 0)
+ elog(ERROR, "negative bitmapset member not allowed");
+ if (lower > upper)
+ elog(ERROR, "lower range must not be above upper range");
+ uwordnum = WORDNUM(upper);
+
+ if (a == NULL)
+ {
+ a = (Bitmapset *) palloc0(BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ }
+
+ /* ensure we have enough words to store the upper bit */
+ else if (uwordnum >= a->nwords)
+ {
+ int oldnwords = a->nwords;
+ int i;
+
+ a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ /* zero out the enlarged portion */
+ for (i = oldnwords; i < a->nwords; i++)
+ a->words[i] = 0;
+ }
+
+ wordnum = lwordnum = WORDNUM(lower);
+
+ /*
+ * Starting at lower's wordnum, loop over each word up to upper's wordnum.
+ * Along the way set all bits inclusively between lower and upper to 1. We
+ * only need to handle the lwordnum and uwordnum specially so we don't set
+ * any bits outside of the range.
+ */
+ while (wordnum <= uwordnum)
+ {
+ bitmapword mask = (bitmapword) ~0;
+
+ /* If working on the lower word, zero out bits below 'lower'. */
+ if (wordnum == lwordnum)
+ {
+ int lbitnum = BITNUM(lower);
+ mask >>= lbitnum;
+ mask <<= lbitnum;
+ }
+
+ /* Likewise, if working on the upper word, zero bits above 'upper' */
+ if (wordnum == uwordnum)
+ {
+ int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1);
+ mask <<= ushiftbits;
+ mask >>= ushiftbits;
+ }
+
+ a->words[wordnum++] |= mask;
+ }
+
+ return a;
+}
+
+/*
* bms_int_members - like bms_intersect, but left input is recycled
*/
Bitmapset *
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index aa3fb253c2..3b62a97775 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -90,6 +90,7 @@ extern bool bms_is_empty(const Bitmapset *a);
extern Bitmapset *bms_add_member(Bitmapset *a, int x);
extern Bitmapset *bms_del_member(Bitmapset *a, int x);
extern Bitmapset *bms_add_members(Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_add_range(Bitmapset *a, int lower, int upper);
extern Bitmapset *bms_int_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_del_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_join(Bitmapset *a, Bitmapset *b);
--
2.11.0
0003-Planner-side-changes-for-partition-pruning-v11.patchtext/plain; charset=UTF-8; name=0003-Planner-side-changes-for-partition-pruning-v11.patchDownload
From 902c005fb3c064da67550fea6d29a8bb21a8fb28 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 3/7] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning (although
as of this commit this contains *all* appinfos as mentioned
below).
5. Some code in try_partition_wise_join in to handle the
possibility that a partition RelOptInfo may not have the basic
information set (note that as noted in 0, set_append_rel_size
now sets such information for only the *live* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, get_partitions_from_clauses
would returns without pruning any partitions. In most cases, it's
obvious in the planner that a set of clauses identified as matching
the partition key don't contain the constant values right away, in
which case, there is no need to call get_partitions_from_clauses
right away. Instead, it should be deferred to another piece of code
which can receive the above list of clauses and runs at a time when
the constant values become available.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 18 ++
src/backend/optimizer/path/allpaths.c | 587 +++++++++++++++++++++++++++-------
src/backend/optimizer/path/indxpath.c | 3 -
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 20 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 101 ++++++
src/include/catalog/partition.h | 2 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
11 files changed, 662 insertions(+), 133 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5daa8a1c19..5e601dd0a4 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1421,6 +1421,24 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, List *partclauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ return result;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a6efb4e1d3..77b13ad397 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,12 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +138,13 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -846,6 +856,381 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * rel->baserestrictinfo
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ List *partclauses;
+ List *result = NIL;
+ int i;
+ Bitmapset *partindexes = NULL;
+ bool contains_const,
+ constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(rel,
+ list_copy(rel->baserestrictinfo),
+ &contains_const,
+ &constfalse);
+
+ /*
+ * If the matched clauses contains at least some constant operands, use
+ * the same to prune partitions right away.
+ */
+ if (partclauses != NIL && contains_const && !constfalse)
+ partindexes = get_partitions_from_clauses(parent, partclauses);
+ else if (!constfalse)
+ /* No clauses to prune paritions, so scan all partitions. */
+ partindexes = bms_add_range(partindexes, 0, partdesc->nparts - 1);
+
+ /* Fetch the partition appinfos. */
+ while ((i = bms_first_member(partindexes)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+
+ /* Remember for future users such as set_append_rel_pathlist(). */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause:
+ *
+ * 1. must be in the form (partkey op const) or (const op partkey);
+ * 2. must contain an operator which is in the same operator family as the
+ * partitioning operator for the partition key column
+ * 3. its input collation must match the partitioning collation
+ *
+ * The "const" mentioned in 1 means any expression that doesn't involve a
+ * volatile function or a Var of this relation. We allow Vars belonging to
+ * other relations (for example, if the clause is a join clause), but they
+ * are treated as parameters whose values are not known now, so cannot be
+ * used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join
+ * clauses appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ *
+ * If clauses contains at least one constant operand or a Nullness test,
+ * *contains_const is set so that the caller can pass the clauses to the
+ * partitioning module right away.
+ *
+ * If the list contains a pseudo-constant RestrictInfo with constant false
+ * value, *constfalse is set.
+ */
+static List *
+match_clauses_to_partkey(RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ int i;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ bool contains_const1,
+ constfalse1;
+
+ /*
+ * If the OR's args contain clauses that match, add the clause
+ * to the result.
+ */
+ if (or_clause((Node *) clause) &&
+ match_clauses_to_partkey(rel,
+ list_copy(((BoolExpr *) clause)->args),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ result = lappend(result, clause);
+ *contains_const = contains_const1;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * These clauses are ANDed with the clauses in the
+ * original list, so queue them after the latter. Note
+ * that it also means that a queued clause will be added to
+ * the result if it happens to match.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the operator is compatible with partitioning and if
+ * so, add it to the list of opclauses matched with this partition
+ * key.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /*
+ * Check if the operator is in the partition operator family.
+ * It the operator happens to be '<>', which is never listed
+ * as part of the operator family, check if its negator
+ * exists and that the latter is compatible with partitioning.
+ */
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning. Flip the left and right
+ * args if we have to, because the code that extracts the
+ * constant value to use for partition-pruning expects to find
+ * it as the rightop of the clause.
+ */
+ if (constexpr == rightop)
+ result = lappend(result, clause);
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(clause);
+ commuted->opno = expr_op;
+ commuted->opfuncid = get_opcode(expr_op);
+ commuted->args = list_make2(rightop, leftop);
+ result = lappend(result, commuted);
+ }
+
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
+ }
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which accept if the
+ * partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN)
+ if (equal((Node *) btest->arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var) && equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1245,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1259,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1296,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1309,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,85 +1319,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
/*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
- /*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
- */
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1164,6 +1489,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ */
+ if (childrel->part_scheme && rel->part_scheme)
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1259,14 +1595,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1337,43 +1688,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1390,17 +1738,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f35380391a..f4203ce200 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -40,9 +40,6 @@
#include "utils/selfuncs.h"
-#define IsBooleanOpfamily(opfamily) \
- ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
-
#define IndexCollMatchesExprColl(idxcollation, exprcollation) \
((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2b868c52de..3e943391b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d58635c887..24d800d8b7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6182,14 +6182,24 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(rel->live_partitioned_rels != NIL &&
+ list_length(rel->live_partitioned_rels) > 0);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3bd1063aa8..b06696b7f0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -735,6 +745,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1747,3 +1758,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ AttrNumber attno;
+
+ if (rel->part_scheme)
+ {
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 945ac0239d..4a1ce92569 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -108,4 +108,6 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
+/* For partition-pruning */
+Bitmapset get_partitions_from_clauses(Relation relation, List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 0d0ba7c66a..f2fddeceb8 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -187,4 +187,7 @@ DATA(insert OID = 4082 ( 3580 pg_lsn_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e085cefb7b..ecf70a66c4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0004-Implement-get_partitions_from_clauses-v11.patchtext/plain; charset=UTF-8; name=0004-Implement-get_partitions_from_clauses-v11.patchDownload
From 7bd8adf227aa280104af146f6f42650b8893779d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 4/7] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 949 +++++++++++++++++++++++++++++++++++++++-
src/include/catalog/partition.h | 3 +-
2 files changed, 947 insertions(+), 5 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5e601dd0a4..17b6a8a258 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -27,6 +27,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -37,6 +39,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "rewrite/rewriteManip.h"
@@ -111,6 +115,67 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Specifies the type of NullTest that was applied to each of the
+ * partition key columns or -1 if none was applied. Partitioning handles
+ * null partition keys specially depending on the partitioning method in
+ * use, so get_partitions_for_keys can return partitions according to
+ * the nullness condition for partition keys.
+ */
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -150,6 +215,21 @@ static int partition_bound_bsearch(PartitionKey key,
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
+static Bitmapset *get_partitions_from_clauses_guts(Relation relation,
+ List *clauses);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool partkey_datum_from_expr(const Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1422,7 +1502,7 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
}
/*
- * get_partitions_using_clauses
+ * get_partitions_from_clauses
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
*
@@ -1432,16 +1512,877 @@ get_partition_dispatch_recurse(Relation rel, Relation parent,
Bitmapset *
get_partitions_from_clauses(Relation relation, List *partclauses)
{
- PartitionDesc partdesc = RelationGetPartitionDesc(relation);
- Bitmapset *result = NULL;
+ Bitmapset *result;
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ if (partconstr)
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+
+ result = get_partitions_from_clauses_guts(relation, partclauses);
- result = bms_add_range(result, 0, partdesc->nparts - 1);
return result;
}
/* Module-local functions */
/*
+ * get_partitions_from_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_guts(Relation relation, List *clauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ nkeys = classify_partition_bounding_keys(relation, clauses,
+ &keys, &constfalse,
+ &or_clauses);
+ /*
+ * Only look up in the partition decriptor if the query provides
+ * constraints on the keys at all.
+ */
+ if (nkeys > 0 && !constfalse)
+ result = get_partitions_for_keys(relation, &keys);
+ else if (!constfalse)
+ /* No constraints on the keys, so, return *all* partitions. */
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ Bitmapset *or_partset = NULL;
+
+ foreach(lc1, or->args)
+ {
+ Expr *orarg = lfirst(lc1);
+ Bitmapset *arg_partset;
+
+ arg_partset = get_partitions_from_clauses_guts(relation,
+ list_make1(orarg));
+
+ /* Combine partition sets obtained from mutually ORed clauses. */
+ or_partset = bms_union(or_partset, arg_partset);
+ }
+
+ /* Combine partition sets obtained from mutually ANDed clauses. */
+ result = bms_intersect(result, or_partset);
+ }
+
+ return result;
+}
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, max keys, along with any
+ * Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max bound.
+ * For example, of a > 1, a > 2, and a >= 5, "5" is the best min bound for
+ * for the column a, which also happens to be an inclusive bound.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by equality clauses. Min and maximum bounds could contain
+ * bound values for only a prefix of key columns.
+ *
+ * If the list contains a pseudo-constant clause, *constfalse is set to true
+ * and no keys are set. It is also set if we encounter mutually contradictory
+ * clauses in this function ourselves, for example, having both a > 1 and
+ * a = 0 in the list.
+ *
+ * All the OR clauses encountered in the list are added to *or_clauses. It's
+ * the responsibility of the caller to process the argument clauses of each of
+ * the OR clauses, which would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_or_clauses = true;
+ Expr *eqkey_exprs[PARTITION_MAX_KEYS],
+ *minkey_exprs[PARTITION_MAX_KEYS],
+ *maxkey_exprs[PARTITION_MAX_KEYS];
+ NullTestType keynullness[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max,
+ eqkey_set[PARTITION_MAX_KEYS],
+ minkey_set[PARTITION_MAX_KEYS],
+ maxkey_set[PARTITION_MAX_KEYS],
+ min_incl,
+ max_incl;
+ int n_eqkeys = 0,
+ n_minkeys = 0,
+ n_maxkeys = 0,
+ n_keynullness = 0,
+ n_total = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ /* -1 represents an invalid value of NullTestType. */
+ memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i],
+ partcoll = partkey->partcollation[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ if (partattno == 0)
+ {
+ partexpr = lfirst(partexprs_item);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause;
+ Expr *leftop,
+ *rightop;
+
+ opclause = (OpExpr *) clause;
+ leftop = linitial(opclause->args);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = lsecond(opclause->args);
+ /* Skip if leftop doesn't match this partition key column. */
+ if ((!IsA(leftop, Var) ||
+ ((Var *) leftop)->varattno != partattno) &&
+ !equal(leftop, partexpr))
+ continue;
+
+ /*
+ * Deal with <> operators that the planner allows if it finds
+ * out that <>'s negator is indeed a valid partopfamily member.
+ * Make an equivalent OR expression and add to the *or_clauses
+ * list. That is, we convert a <> opclause into
+ * (leftop < rightop) OR (leftop > rightop).
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily) &&
+ (partkey->strategy == PARTITION_STRATEGY_RANGE ||
+ partkey->strategy == PARTITION_STRATEGY_LIST))
+ {
+ Expr *ltexpr,
+ *gtexpr;
+ Oid negator,
+ ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ negator = get_negator(opclause->opno);
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ elog(LOG, "unexpected negator of '<>' operator");
+ ltop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop, (Expr *) rightop,
+ InvalidOid, partcoll);
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR,
+ list_make2(ltexpr, gtexpr),
+ -1));
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->op = opclause;
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+
+ /* A strict operator implies NOT NULL argument. */
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = IS_NOT_NULL;
+ n_keynullness++;
+ }
+ only_or_clauses = false;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * Planner must have accepted this saop iff saop_op's negator
+ * was found to be a valid partopfamily member.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ negated = true;
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) && partattno != 0 &&
+ ((Var *) arg)->varattno == partattno) ||
+ !equal(arg, partexpr))
+ {
+ if (keynullness[i] == -1)
+ {
+ keynullness[i] = nulltest->nulltesttype;
+ n_keynullness++;
+ }
+ only_or_clauses = false;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ only_or_clauses = false;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_or_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Redundant key elimination using btree-semantics based tricks.
+ *
+ * Only list and range partitioning use btree operator semantics, so
+ * skip otherwise. Also, if there are expressions whose value is yet
+ * unknown, skip this step, because we need to compare actual values
+ * below.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ if (partkey->strategy == PARTITION_STRATEGY_LIST ||
+ partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i],
+ &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys. An equal bounding key must contain all partition key
+ * columns, whereas a prefix of all partition key columns is admissible
+ * as min and max keys.
+ */
+ memset(eqkey_exprs, 0, sizeof(eqkey_exprs));
+ memset(minkey_exprs, 0, sizeof(minkey_exprs));
+ memset(maxkey_exprs, 0, sizeof(maxkey_exprs));
+ memset(eqkey_set, false, sizeof(eqkey_set));
+ memset(minkey_set, false, sizeof(minkey_set));
+ memset(maxkey_set, false, sizeof(maxkey_set));
+
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * If no scan key existed for the previous column, we are done.
+ */
+ if (i > n_eqkeys)
+ need_next_eq = false;
+
+ if (i > n_minkeys)
+ need_next_min = false;
+
+ if (i > n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ int strategy = clause->op_strategy;
+
+ switch (strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = (strategy == BTLessEqualStrategyNumber);
+
+ if (strategy == BTLessStrategyNumber)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = (strategy == BTGreaterEqualStrategyNumber);
+
+ if (strategy == BTGreaterStrategyNumber)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ case BTEqualStrategyNumber:
+ if (need_next_eq)
+ {
+ eqkey_exprs[i] = clause->constarg;
+ if (!eqkey_set[i])
+ n_eqkeys++;
+ eqkey_set[i] = true;
+ }
+
+ if (need_next_min)
+ {
+ minkey_exprs[i] = clause->constarg;
+ if (!minkey_set[i])
+ n_minkeys++;
+ minkey_set[i] = true;
+ min_incl = true;
+ }
+
+ if (need_next_max)
+ {
+ maxkey_exprs[i] = clause->constarg;
+ if (!maxkey_set[i])
+ n_maxkeys++;
+ maxkey_set[i] = true;
+ max_incl = true;
+ }
+ break;
+
+ /*
+ * Ideally, never get here, because 1. we don't support
+ * operators that are not btree operators and 2. clauses
+ * containing '<>' which are not listed in the btree operator
+ * families have already been handled by the higher-level
+ * code.
+ */
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we have equal keys for all the partition key columns, then mark
+ * their copies in minkeys and maxkeys as invalid, so that we perform
+ * partition lookup using only eqkeys. Don't pass as the equal key
+ * otherwise.
+ */
+ if (n_eqkeys == partkey->partnatts)
+ n_minkeys = n_maxkeys = 0;
+ else
+ n_eqkeys = 0;
+
+ /* Populate keys. */
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ if (n_eqkeys + n_minkeys + n_maxkeys + n_keynullness > 0)
+ {
+ Datum value;
+ int n_datums_resolved;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_eqkeys; i++)
+ {
+ if (partkey_datum_from_expr(eqkey_exprs[i], &value))
+ {
+ keys->eqkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_eqkeys = n_datums_resolved;
+ n_total += keys->n_eqkeys;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_minkeys; i++)
+ {
+ if (partkey_datum_from_expr(minkey_exprs[i], &value))
+ {
+ keys->minkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_minkeys = n_datums_resolved;
+ n_total += keys->n_minkeys;
+ keys->min_incl = min_incl;
+
+ n_datums_resolved = 0;
+ for (i = 0; i < n_maxkeys; i++)
+ {
+ if (partkey_datum_from_expr(maxkey_exprs[i], &value))
+ {
+ keys->maxkeys[i] = value;
+ n_datums_resolved++;
+ }
+ }
+ keys->n_maxkeys = n_datums_resolved;
+ n_total += keys->n_maxkeys;
+ keys->max_incl = max_incl;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ keys->keynullness[i] = keynullness[i];
+ n_total += n_keynullness;
+ }
+
+ return n_total;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(const Expr *expr, Datum *value)
+{
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse)
+{
+ Oid partopfamily = partkey->partopfamily[partattoff];
+ Oid partopcintype = partkey->partopcintype[partattoff];
+ PartClause *xform[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ /*
+ * xform[s] points to the currently best scan key of strategy type s+1; it
+ * is NULL if we haven't yet found such a key for this attr.
+ */
+ memset(xform, 0, sizeof(xform));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+ get_op_opfamily_properties(cur->op->opno, partopfamily, false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ s = cur->op_strategy - 1;
+ /* Have we seen a clause of this strategy before?. */
+ if (xform[s] == NULL)
+ {
+ /* nope, so assign. */
+ xform[s] = cur;
+ }
+ else
+ {
+ /* yup, keep only the more restrictive key. */
+ if (partition_cmp_args(partopfamily, partopcintype,
+ cur, cur, xform[s],
+ &test_result))
+ {
+ if (test_result)
+ xform[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* else the old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in xform[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ /* Finished processing all clauses. Now compare across strategies. */
+ if (xform[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = xform[BTEqualStrategyNumber - 1];
+
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ {
+ PartClause *chk = xform[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ xform[s] = NULL;
+ }
+ }
+ }
+
+ /* try to keep only one of <, <= */
+ if (xform[BTLessStrategyNumber - 1] &&
+ xform[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = xform[BTLessStrategyNumber - 1],
+ *le = xform[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* try to keep only one of >, >= */
+ if (xform[BTGreaterStrategyNumber - 1] &&
+ xform[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = xform[BTGreaterStrategyNumber - 1],
+ *ge = xform[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partopfamily, partopcintype, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ xform[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ xform[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * xform now contains "best" clauses for i'th partition key column
+ * for given btree strategy number. Copy them to keyclauses[i].
+ */
+ for (s = BTMaxStrategyNumber; --s >= 0;)
+ if (xform[s])
+ *result = lappend(*result, xform[s]);
+}
+
+static bool
+partition_cmp_args(Oid partopfamily, Oid partopcintype,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ if (!partkey_datum_from_expr(leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg and rightarg clauses' constants are both of the type
+ * expected by "op" clause's operator, then compare then using the
+ * latter's comparison function.
+ */
+ if (leftarg->op_subtype == partopcintype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ Bitmapset *result = NULL;
+
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 4a1ce92569..81c626fa4a 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -109,5 +109,6 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
extern List *get_proposed_default_constraint(List *new_part_constaints);
/* For partition-pruning */
-Bitmapset get_partitions_from_clauses(Relation relation, List *partclauses);
+extern Bitmapset *get_partitions_from_clauses(Relation relation,
+ List *partclauses);
#endif /* PARTITION_H */
--
2.11.0
0005-Some-interface-changes-for-partition_bound_-cmp-bsea-v11.patchtext/plain; charset=UTF-8; name=0005-Some-interface-changes-for-partition_bound_-cmp-bsea-v11.patchDownload
From 47f02c8c1a3346ad4ad91a5544b98eeaa44caaec Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 5/7] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 135 ++++++++++++++++++++++++++++------------
1 file changed, 96 insertions(+), 39 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 17b6a8a258..a2068ae422 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -176,6 +176,30 @@ typedef struct PartScanKeyInfo
NullTestType keynullness[PARTITION_MAX_KEYS];
} PartScanKeyInfo;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
@@ -204,14 +228,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static void get_partition_dispatch_recurse(Relation rel, Relation parent,
List **pds, List **leaf_part_oids);
@@ -903,10 +928,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -957,6 +988,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -978,8 +1010,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -993,9 +1028,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3382,12 +3417,15 @@ get_partition_for_tuple(PartitionDispatch *pd,
{
bool equal = false;
int cur_offset;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (cur_offset >= 0 && equal)
cur_index = partdesc->boundinfo->indexes[cur_offset];
}
@@ -3399,6 +3437,7 @@ get_partition_for_tuple(PartitionDispatch *pd,
range_partkey_has_null = false;
int cur_offset;
int i;
+ PartitionBoundCmpArg arg;
/*
* No range includes NULL, so this will be accepted by the
@@ -3429,12 +3468,13 @@ get_partition_for_tuple(PartitionDispatch *pd,
if (range_partkey_has_null)
break;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
cur_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ &arg, &equal);
/*
* The offset returned is such that the bound at
* cur_offset is less than or equal to the tuple value, so
@@ -3631,12 +3671,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -3658,11 +3698,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -3670,17 +3710,35 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
switch (key->strategy)
{
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -3691,12 +3749,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -3710,20 +3769,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -3736,8 +3794,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0006-Tweak-default-range-partition-s-constraint-a-little-v11.patchtext/plain; charset=UTF-8; name=0006-Tweak-default-range-partition-s-constraint-a-little-v11.patchDownload
From 473dc4917eccb402b570961c92754ea342d9445d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 31 Oct 2017 16:26:55 +0900
Subject: [PATCH 6/7] Tweak default range partition's constraint a little
When using as a predicate, it's useful for it explicitly say that
the default range partition might contain nulls, because non-default
range partitions can't.
---
src/backend/catalog/partition.c | 29 +++++++++++++++++++++++------
src/test/regress/expected/update.out | 2 +-
2 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index a2068ae422..fb973ef20c 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2934,12 +2934,29 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
if (or_expr_args != NIL)
{
- /* OR all the non-default partition constraints; then negate it */
- result = lappend(result,
- list_length(or_expr_args) > 1
- ? makeBoolExpr(OR_EXPR, or_expr_args, -1)
- : linitial(or_expr_args));
- result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
+ Expr *other_parts_constr;
+
+ /*
+ * Combine the constraints obtained for non-default partitions
+ * using OR. As requested, each of the OR's args doesn't include
+ * the NOT NULL test for partition keys (which is to avoid its
+ * useless repetition). Add the same now.
+ */
+ other_parts_constr =
+ makeBoolExpr(AND_EXPR,
+ lappend(get_range_nulltest(key),
+ list_length(or_expr_args) > 1
+ ? makeBoolExpr(OR_EXPR, or_expr_args,
+ -1)
+ : linitial(or_expr_args)),
+ -1);
+
+ /*
+ * Finally, the default partition contains everything *NOT*
+ * contained in the non-default partitions.
+ */
+ result = list_make1(makeBoolExpr(NOT_EXPR,
+ list_make1(other_parts_constr), -1));
}
return result;
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index cef70b1a1e..40217bdf9c 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -227,7 +227,7 @@ create table part_def partition of range_parted default;
a | text | | | | extended | |
b | integer | | | | plain | |
Partition of: range_parted DEFAULT
-Partition constraint: (NOT (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20))))
+Partition constraint: (NOT ((a IS NOT NULL) AND (b IS NOT NULL) AND (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20)))))
insert into range_parted values ('c', 9);
-- ok
--
2.11.0
0007-Implement-get_partitions_for_keys-v11.patchtext/plain; charset=UTF-8; name=0007-Implement-get_partitions_for_keys-v11.patchDownload
From 6b04f3560b673005fa8b9a81262754227349af89 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 7/7] Implement get_partitions_for_keys
Disable constraint_exclusion using internal partition constraints.
---
src/backend/catalog/partition.c | 355 +++++++++++++++++++++++++++++++-
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition.out | 121 ++++-------
3 files changed, 391 insertions(+), 89 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index fb973ef20c..c58c7354c6 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2409,10 +2409,361 @@ partition_cmp_args(Oid partopfamily, Oid partopcintype,
static Bitmapset *
get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
{
- PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
Bitmapset *result = NULL;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return NULL;
+
+ /*
+ * Check if any of the scan keys are null. If so, return the only
+ * null-accepting partition if boundinfo says there is one.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keynullness[i] == IS_NULL)
+ {
+ int other_idx = -1;
+
+ /*
+ * Note that only one of the null-accepting partition and the
+ * default partition can be holding null values at any given
+ * time.
+ */
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ result = bms_make_singleton(other_idx);
+
+ return result;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(boundinfo->default_index);
+ return result;
+ }
+ /* No bounding keys, so just return all partitions. */
+ else if (keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys == 0)
+ {
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ return result;
+ }
+
+ /* Valid keys->eqkeys must provide all partition keys. */
+ Assert(keys->n_eqkeys == 0 || keys->n_eqkeys == partkey->partnatts);
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0)
+ {
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* For list partition, must exactly match the datum. */
+ if (!is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ eqoff += 1;
+ }
+ }
- result = bms_add_range(result, 0, partdesc->nparts - 1);
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ result = bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(boundinfo->default_index);
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return result;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+
+ /*
+ * minoff set to -1 means all datums are greater than
+ * minkeys, which means all partitions satisfy minkeys.
+ */
+ if (minoff == -1)
+ minoff = 0;
+
+ /*
+ * minkeys matched one of the datums (because, is_equal), but
+ * the query may have asked to exclude that value. If so,
+ * move to the bound on the right, which doesn't necessarily
+ * mean we're excluding the list partition containing that
+ * value, because there very well might be values in the range
+ * thus selected that belong to the partition to which the
+ * matched value (minkeys) also belongs.
+ */
+ if (is_equal && !keys->min_incl)
+ minoff++;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ if (minoff < boundinfo->ndatums - 1)
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ /* 1 more index than datums in this case */
+ maxoff = boundinfo->ndatums;
+ else
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /* See the comment above for minkeys. */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Add to the other_parts, list partition indexes are not
+ * monotonously increasing like range partitions' are.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list
+ * partition. Because list partitions divide the key space
+ * in a discontinuous manner, not all values in the given
+ * range will have a partition assigned.
+ */
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
return result;
}
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..661f137122 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 484c6fe750..206c6de5b7 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -675,16 +667,14 @@ create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mc3p where a = 1;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+-------------------------
Append
-> Seq Scan on mc3p0
Filter: (a = 1)
-> Seq Scan on mc3p1
Filter: (a = 1)
- -> Seq Scan on mc3p_default
- Filter: (a = 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
QUERY PLAN
@@ -702,9 +692,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
Filter: ((a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
QUERY PLAN
@@ -714,9 +702,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -730,9 +716,7 @@ explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-> Seq Scan on mc3p4
Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
- -> Seq Scan on mc3p_default
- Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a > 10;
QUERY PLAN
@@ -771,16 +755,14 @@ explain (costs off) select * from mc3p where a >= 10;
(17 rows)
explain (costs off) select * from mc3p where a < 10;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on mc3p0
Filter: (a < 10)
-> Seq Scan on mc3p1
Filter: (a < 10)
- -> Seq Scan on mc3p_default
- Filter: (a < 10)
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
QUERY PLAN
@@ -792,9 +774,7 @@ explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
Filter: ((a <= 10) AND (abs(b) < 10))
-> Seq Scan on mc3p2
Filter: ((a <= 10) AND (abs(b) < 10))
- -> Seq Scan on mc3p_default
- Filter: ((a <= 10) AND (abs(b) < 10))
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
QUERY PLAN
@@ -821,8 +801,8 @@ explain (costs off) select * from mc3p where a > 20;
(3 rows)
explain (costs off) select * from mc3p where a >= 20;
- QUERY PLAN
---------------------------------
+ QUERY PLAN
+---------------------------
Append
-> Seq Scan on mc3p5
Filter: (a >= 20)
@@ -830,9 +810,7 @@ explain (costs off) select * from mc3p where a >= 20;
Filter: (a >= 20)
-> Seq Scan on mc3p7
Filter: (a >= 20)
- -> Seq Scan on mc3p_default
- Filter: (a >= 20)
-(9 rows)
+(7 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
QUERY PLAN
@@ -872,9 +850,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
-(11 rows)
+(9 rows)
explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
QUERY PLAN
@@ -912,9 +888,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-> Seq Scan on mc3p4
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
-(13 rows)
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
QUERY PLAN
@@ -926,9 +900,7 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 a
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-> Seq Scan on mc3p2
Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
- -> Seq Scan on mc3p_default
- Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
-(9 rows)
+(7 rows)
-- a simpler multi-column keys case
create table mc2p (a int, b int) partition by range (a, b);
@@ -999,28 +971,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1030,33 +994,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
--
2.11.0
0001-Add-new-tests-for-partition-pruning-v11.patchtext/plain; charset=UTF-8; name=0001-Add-new-tests-for-partition-pruning-v11.patchDownload
From 1c8d55f8235f28135ab8f8f00e23afe469253e06 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 1/7] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 1085 +++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 155 +++++
4 files changed, 1242 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..484c6fe750
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,1085 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_10
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_30
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a)::numeric = '1'::numeric)
+(31 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 10)
+(9 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_2
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_30
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 10)
+(23 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a < 15)
+(9 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 15)
+(17 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(17 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text < 'ab'::text) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+(5 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on rlp_default_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a is not null;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3abcd
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3efgh
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_10
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_30
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_default
+ Filter: (a IS NOT NULL)
+(29 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp5_1
+ Filter: (a > 30)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 30)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+----------------------------------
+ Append
+ -> Seq Scan on rlp_default_30
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_30
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 31)
+(29 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_10
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_null
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(25 rows)
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 20) AND (a < 27))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 29;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a = 29)
+(3 rows)
+
+explain (costs off) select * from rlp where a >= 29;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_1
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_30
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_default
+ Filter: (a >= 29)
+(11 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default_10
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 1) AND (a >= 15))
+(23 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+explain (costs off) select * from mc2p where a < 2;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc2p0
+ Filter: (a < 2)
+ -> Seq Scan on mc2p1
+ Filter: (a < 2)
+ -> Seq Scan on mc2p2
+ Filter: (a < 2)
+ -> Seq Scan on mc2p_default
+ Filter: (a < 2)
+(9 rows)
+
+explain (costs off) select * from mc2p where a = 2 and b < 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on mc2p3
+ Filter: ((b < 1) AND (a = 2))
+(3 rows)
+
+explain (costs off) select * from mc2p where a > 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc2p3
+ Filter: (a > 1)
+ -> Seq Scan on mc2p4
+ Filter: (a > 1)
+ -> Seq Scan on mc2p5
+ Filter: (a > 1)
+ -> Seq Scan on mc2p_default
+ Filter: (a > 1)
+(9 rows)
+
+explain (costs off) select * from mc2p where a = 1 and b > 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on mc2p2
+ Filter: ((b > 1) AND (a = 1))
+(3 rows)
+
+-- boolean partitioning
+create table boolpart (a bool) partition by list (a);
+create table boolpart_default partition of boolpart default;
+create table boolpart_t partition of boolpart for values in ('true');
+create table boolpart_f partition of boolpart for values in ('false');
+explain (costs off) select * from boolpart where a in (true, false);
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a = ANY ('{t,f}'::boolean[]))
+ -> Seq Scan on boolpart_t
+ Filter: (a = ANY ('{t,f}'::boolean[]))
+(5 rows)
+
+explain (costs off) select * from boolpart where a = false;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (NOT a)
+ -> Seq Scan on boolpart_t
+ Filter: (NOT a)
+ -> Seq Scan on boolpart_default
+ Filter: (NOT a)
+(7 rows)
+
+explain (costs off) select * from boolpart where not a = false;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: a
+ -> Seq Scan on boolpart_t
+ Filter: a
+ -> Seq Scan on boolpart_default
+ Filter: a
+(7 rows)
+
+explain (costs off) select * from boolpart where a is true or a is not true;
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+ -> Seq Scan on boolpart_t
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+ -> Seq Scan on boolpart_default
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+(7 rows)
+
+explain (costs off) select * from boolpart where a is not true;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a IS NOT TRUE)
+ -> Seq Scan on boolpart_t
+ Filter: (a IS NOT TRUE)
+ -> Seq Scan on boolpart_default
+ Filter: (a IS NOT TRUE)
+(7 rows)
+
+explain (costs off) select * from boolpart where a is not true and a is not false;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
+ -> Seq Scan on boolpart_t
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
+ -> Seq Scan on boolpart_default
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
+(7 rows)
+
+explain (costs off) select * from boolpart where a is unknown;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a IS UNKNOWN)
+ -> Seq Scan on boolpart_t
+ Filter: (a IS UNKNOWN)
+ -> Seq Scan on boolpart_default
+ Filter: (a IS UNKNOWN)
+(7 rows)
+
+explain (costs off) select * from boolpart where a is not unknown;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a IS NOT UNKNOWN)
+ -> Seq Scan on boolpart_t
+ Filter: (a IS NOT UNKNOWN)
+ -> Seq Scan on boolpart_default
+ Filter: (a IS NOT UNKNOWN)
+(7 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..392cb9bbe9
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,155 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null;
+explain (costs off) select * from rlp where a is not null;
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+explain (costs off) select * from rlp where a = 29;
+explain (costs off) select * from rlp where a >= 29;
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (1, maxvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+
+explain (costs off) select * from mc2p where a < 2;
+explain (costs off) select * from mc2p where a = 2 and b < 1;
+explain (costs off) select * from mc2p where a > 1;
+explain (costs off) select * from mc2p where a = 1 and b > 1;
+
+-- boolean partitioning
+create table boolpart (a bool) partition by list (a);
+create table boolpart_default partition of boolpart default;
+create table boolpart_t partition of boolpart for values in ('true');
+create table boolpart_f partition of boolpart for values in ('false');
+
+explain (costs off) select * from boolpart where a in (true, false);
+explain (costs off) select * from boolpart where a = false;
+explain (costs off) select * from boolpart where not a = false;
+explain (costs off) select * from boolpart where a is true or a is not true;
+explain (costs off) select * from boolpart where a is not true;
+explain (costs off) select * from boolpart where a is not true and a is not false;
+explain (costs off) select * from boolpart where a is unknown;
+explain (costs off) select * from boolpart where a is not unknown;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
--
2.11.0
On 6 November 2017 at 23:01, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
OK, I have gotten rid of the min/max partition index interface and instead
adopted the bms_add_range() approach by including your patch to add the
same in the patch set (which is now 0002 in the whole set). I have to
admit that it's simpler to understand the new code with just Bitmapsets to
look at, but I'm still a bit concerned about materializing the whole set
right within partition.c, although we can perhaps optimize it later.
Thanks for making that change. The code looks much more simple now.
For performance, if you're worried about a very large number of
partitions, then I think you're better off using bms_next_member()
rather than bms_first_member(), (likely this applies globally, but you
don't need to worry about those).
The problem with bms_first_member is that it must always loop over the
0 words before it finds any bits set for each call, whereas
bms_next_member will start on the word it was last called for. There
will likely be a pretty big performance difference between the two
when processing a large Bitmapset.
Attached updated set of patches, including the fix to make the new pruning
code handle Boolean partitioning.
Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7 November 2017 at 01:52, David Rowley <david.rowley@2ndquadrant.com> wrote:
Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
I have a little more review to share:
1. Missing "in" in comment. Should be "mentioned in"
* get_append_rel_partitions
* Return the list of partitions of rel that pass the clauses mentioned
* rel->baserestrictinfo
2. Variable should be declared in inner scope with the following fragment:
void
set_basic_child_rel_properties(PlannerInfo *root,
RelOptInfo *rel,
RelOptInfo *childrel,
AppendRelInfo *appinfo)
{
AttrNumber attno;
if (rel->part_scheme)
{
which makes the code the same as where you moved it from.
3. Normally lfirst() is assigned to a variable at the start of a
foreach() loop. You have code which does not follow this.
foreach(lc, clauses)
{
Expr *clause;
int i;
if (IsA(lfirst(lc), RestrictInfo))
{
RestrictInfo *rinfo = lfirst(lc);
You could assign this to a Node * since the type is unknown to you at
the start of the loop.
4.
/*
* Useless if what we're thinking of as a constant is actually
* a Var coming from this relation.
*/
if (bms_is_member(rel->relid, constrelids))
continue;
should this be moved to just above the op_strict() test? This one seems cheaper.
5. Typo "paritions": /* No clauses to prune paritions, so scan all
partitions. */
But thinking about it more the comment should something more along the
lines of /* No useful clauses for partition pruning. Scan all
partitions. */
The key difference is that there might be clauses, just without Consts.
Actually, the more I look at get_append_rel_partitions() I think it
would be better if you re-shaped that if/else if test so that it only
performs the loop over the partindexes if it's been set.
I ended up with the attached version of the function after moving
things around a little bit.
I'm still reviewing but thought I'd share this part so far.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
On 2017/11/06 21:52, David Rowley wrote:
On 6 November 2017 at 23:01, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
OK, I have gotten rid of the min/max partition index interface and instead
adopted the bms_add_range() approach by including your patch to add the
same in the patch set (which is now 0002 in the whole set). I have to
admit that it's simpler to understand the new code with just Bitmapsets to
look at, but I'm still a bit concerned about materializing the whole set
right within partition.c, although we can perhaps optimize it later.Thanks for making that change. The code looks much more simple now.
For performance, if you're worried about a very large number of
partitions, then I think you're better off using bms_next_member()
rather than bms_first_member(), (likely this applies globally, but you
don't need to worry about those).The problem with bms_first_member is that it must always loop over the
0 words before it finds any bits set for each call, whereas
bms_next_member will start on the word it was last called for. There
will likely be a pretty big performance difference between the two
when processing a large Bitmapset.
Ah, thanks for the explanation. I will change it to bms_next_member() in
the next version.
Attached updated set of patches, including the fix to make the new pruning
code handle Boolean partitioning.Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
Thank you.
Regards,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7 November 2017 at 01:52, David Rowley <david.rowley@2ndquadrant.com> wrote:
Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
Hi Amit,
I had another look over this today. Apologies if any of the review seems petty.
Here goes:
1. If test seems to be testing for a child that's a partitioned table,
but is testing for a non-NULL part_scheme.
/*
* If childrel is itself partitioned, add it and its partitioned
* children to the list being propagated up to the root rel.
*/
if (childrel->part_scheme && rel->part_scheme)
Should this code use IS_PARTITIONED_REL() instead? Seems a bit strange
to test for a NULL part_scheme
2. There's a couple of mistakes in my bms_add_range() code. I've
attached bms_add_range_fix.patch. Can you apply this to your tree?
3. This assert seems to be Asserting the same thing twice:
Assert(rel->live_partitioned_rels != NIL &&
list_length(rel->live_partitioned_rels) > 0);
A List with length == 0 is always NIL.
4. get_partitions_from_clauses(), can you comment why you perform the
list_concat() there.
I believe this is there so that the partition bound from the parent is
passed down to the child so that we can properly eliminate all child
partitions when the 2nd level of partitioning is using the same
partition key as the 1st level. I think this deserves a paragraph of
comment to explain this.
5. Please add a comment to explain what's going on here in
classify_partition_bounding_keys()
if (partattno == 0)
{
partexpr = lfirst(partexprs_item);
partexprs_item = lnext(partexprs_item);
}
Looks like, similar to index expressions, that partition expressions
are attno 0 to mark to consume the next expression from the list.
Does this need validation that there are enough partexprs_item items
like what is done in get_range_key_properties()? Or is this validated
somewhere else?
6. Comment claims the if test will test something which it does not
seem to test for:
/*
* Redundant key elimination using btree-semantics based tricks.
*
* Only list and range partitioning use btree operator semantics, so
* skip otherwise. Also, if there are expressions whose value is yet
* unknown, skip this step, because we need to compare actual values
* below.
*/
memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
if (partkey->strategy == PARTITION_STRATEGY_LIST ||
partkey->strategy == PARTITION_STRATEGY_RANGE)
I was expecting this to be skipped when the clauses contained a
non-const, but it does not seem to.
7. Should be "compare them"
/*
* If the leftarg and rightarg clauses' constants are both of the type
* expected by "op" clause's operator, then compare then using the
* latter's comparison function.
*/
But if I look at the code "compare then using the latter's comparison
function." is not true, it seems to use op's comparison function not
rightarg's. With most of the calls op and rightarg are the same, but
not all of them. The function shouldn't make that assumption even if
the args op was always the same as rightarg.
8. remove_redundant_clauses() needs an overview comment of what the
function does.
9. The comment should explain what we would do in the case of key < 3
AND key <= 2 using some examples.
/* try to keep only one of <, <= */
10. Wondering why this loop runs backward?
for (s = BTMaxStrategyNumber; --s >= 0;)
Why not just:
for (s = 0; s < BTMaxStrategyNumber; s++)
I can't see a special reason for it to run backward. It seems unusual,
but if there's a good reason that I've failed to realise then it's
maybe worth a comment.
11. Pleae comment on why *constfalse = true is set here:
if (!chk || s == (BTEqualStrategyNumber - 1))
continue;
if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
&test_result))
{
if (!test_result)
{
*constfalse = true;
return;
}
/* discard the redundant key. */
xform[s] = NULL;
}
Looks like we'd hit this in a case such as: WHERE key = 1 AND key > 1.
Also please add a comment when discarding the redundant key maybe
explain that equality is more useful than the other strategies when
there's an overlap.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
bms_add_range_fix.patchapplication/octet-stream; name=bms_add_range_fix.patchDownload
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 34d242b..7146ba1 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -789,7 +789,7 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
* Add members in the range of 'lower' to 'upper' to the set.
*
* Note this could also be done by calling bms_add_member in a loop, however,
- * using this function will be faster when the range is large as we work with
+ * using this function will be faster when the range is large as we're working
* at the bitmapword level rather than at bit level.
*/
Bitmapset *
@@ -834,7 +834,7 @@ bms_add_range(Bitmapset *a, int lower, int upper)
*/
while (wordnum <= uwordnum)
{
- bitmapword mask = (bitmapword) ~0;
+ bitmapword mask = (~(bitmapword) 0);
/* If working on the lower word, zero out bits below 'lower'. */
if (wordnum == lwordnum)
Hi David.
Thanks for the review.
(..also looking at the comments you sent earlier today.)
On 2017/11/07 11:14, David Rowley wrote:
On 7 November 2017 at 01:52, David Rowley <david.rowley@2ndquadrant.com> wrote:
Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
I have a little more review to share:
1. Missing "in" in comment. Should be "mentioned in"
* get_append_rel_partitions
* Return the list of partitions of rel that pass the clauses mentioned
* rel->baserestrictinfo2. Variable should be declared in inner scope with the following fragment:
void
set_basic_child_rel_properties(PlannerInfo *root,
RelOptInfo *rel,
RelOptInfo *childrel,
AppendRelInfo *appinfo)
{
AttrNumber attno;if (rel->part_scheme)
{which makes the code the same as where you moved it from.
It seems like you included the above changes in your attached C file,
which I will incorporate into my repository.
3. Normally lfirst() is assigned to a variable at the start of a
foreach() loop. You have code which does not follow this.foreach(lc, clauses)
{
Expr *clause;
int i;if (IsA(lfirst(lc), RestrictInfo))
{
RestrictInfo *rinfo = lfirst(lc);You could assign this to a Node * since the type is unknown to you at
the start of the loop.
Will make the suggested changes to match_clauses_to_partkey().
4.
/*
* Useless if what we're thinking of as a constant is actually
* a Var coming from this relation.
*/
if (bms_is_member(rel->relid, constrelids))
continue;should this be moved to just above the op_strict() test? This one seems cheaper.
Agreed, will do. Also makes sense to move the PartCollMatchesExprColl()
test together.
5. Typo "paritions": /* No clauses to prune paritions, so scan all
partitions. */But thinking about it more the comment should something more along the
lines of /* No useful clauses for partition pruning. Scan all
partitions. */
You fixed it. :)
The key difference is that there might be clauses, just without Consts.
Actually, the more I look at get_append_rel_partitions() I think it
would be better if you re-shaped that if/else if test so that it only
performs the loop over the partindexes if it's been set.I ended up with the attached version of the function after moving
things around a little bit.
Thanks a lot for that. Looks much better now.
I'm still reviewing but thought I'd share this part so far.
As mentioned at the top, I'm looking at your latest comments and they all
seem to be good points to me, so will address those in the next version.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:
Attached updated set of patches, including the fix to make the new pruning
code handle Boolean partitioning.
Hi Amit,
I have tried pruning for different values of constraint exclusion GUC
change, not sure exactly how it should behave, but I can see with the
delete statement pruning is not happening when constraint_exclusion is off,
but select is working as expected. Is this expected behaviour?
create table lp (c1 int, c2 text) partition by list(c1);
create table lp1 partition of lp for values in (1,2);
create table lp2 partition of lp for values in (3,4);
create table lp3 partition of lp for values in (5,6);
insert into lp values (1,'p1'),(2,'p1'),(3,'p2'),(4,'p2'),(5,'p3');
show constraint_exclusion ;
constraint_exclusion
----------------------
partition
(1 row)
explain select c1 from lp where c1 >= 1 and c1 < 2;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..29.05 rows=6 width=4)
-> Seq Scan on lp1 (cost=0.00..29.05 rows=6 width=4)
Filter: ((c1 >= 1) AND (c1 < 2))
(3 rows)
explain delete from lp where c1 >= 1 and c1 < 2;
QUERY PLAN
----------------------------------------------------------
Delete on lp (cost=0.00..29.05 rows=6 width=6)
Delete on lp1
-> Seq Scan on lp1 (cost=0.00..29.05 rows=6 width=6)
Filter: ((c1 >= 1) AND (c1 < 2))
(4 rows)
set constraint_exclusion = off;
explain select c1 from lp where c1 >= 1 and c1 < 2;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..29.05 rows=6 width=4)
-> Seq Scan on lp1 (cost=0.00..29.05 rows=6 width=4)
Filter: ((c1 >= 1) AND (c1 < 2))
(3 rows)
*explain delete from lp where c1 >= 1 and c1 < 2;*
QUERY PLAN
----------------------------------------------------------
Delete on lp (cost=0.00..87.15 rows=18 width=6)
Delete on lp1
Delete on lp2
Delete on lp3
-> Seq Scan on lp1 (cost=0.00..29.05 rows=6 width=6)
Filter: ((c1 >= 1) AND (c1 < 2))
-> Seq Scan on lp2 (cost=0.00..29.05 rows=6 width=6)
Filter: ((c1 >= 1) AND (c1 < 2))
-> Seq Scan on lp3 (cost=0.00..29.05 rows=6 width=6)
Filter: ((c1 >= 1) AND (c1 < 2))
(10 rows)
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
Hi Rajkumar,
Thanks for testing.
On 2017/11/08 15:52, Rajkumar Raghuwanshi wrote:
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:Attached updated set of patches, including the fix to make the new pruning
code handle Boolean partitioning.Hi Amit,
I have tried pruning for different values of constraint exclusion GUC
change, not sure exactly how it should behave, but I can see with the
delete statement pruning is not happening when constraint_exclusion is off,
but select is working as expected. Is this expected behaviour?
Hmm, the new pruning only works for selects, not DML. The patch also
changes get_relation_constraints() to not include the internal partition
constraints, but mistakenly does so for all query types, not just select.
Will look into it.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/11/06 14:32, David Rowley wrote:
On 6 November 2017 at 17:30, Amit Langote wrote:
On 2017/11/03 13:32, David Rowley wrote:
On 31 October 2017 at 21:43, Amit Langote wrote:
[....]
Attached updated set of patches, including the fix to make the new pruning
code handle Boolean partitioning.
I am getting following warning on mac os x:
partition.c:1800:24: warning: comparison of constant -1 with
expression of type 'NullTestType'
(aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~~~~~~~~~~~~~ ^ ~~
partition.c:1932:25: warning: comparison of constant -1 with
expression of type 'NullTestType'
(aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~~~~~~~~~~~~~ ^ ~~
2 warnings generated.
Comment for 0004 patch:
270 + /* -1 represents an invalid value of NullTestType. */
271 + memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));
I think we should not use memset to set a value other than 0 or true/false.
This will work for -1 on the system where values are stored in the 2's
complement but I am afraid of other architecture.
Regards,
Amul
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Amul.
On 2017/11/09 20:05, amul sul wrote:
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/11/06 14:32, David Rowley wrote:
On 6 November 2017 at 17:30, Amit Langote wrote:
On 2017/11/03 13:32, David Rowley wrote:
On 31 October 2017 at 21:43, Amit Langote wrote:
[....]
Attached updated set of patches, including the fix to make the new pruning
code handle Boolean partitioning.I am getting following warning on mac os x:
Thanks for the review.
partition.c:1800:24: warning: comparison of constant -1 with
expression of type 'NullTestType'
(aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~~~~~~~~~~~~~ ^ ~~
partition.c:1932:25: warning: comparison of constant -1 with
expression of type 'NullTestType'
(aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~~~~~~~~~~~~~ ^ ~~
2 warnings generated.Comment for 0004 patch:
270 + /* -1 represents an invalid value of NullTestType. */
271 + memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));I think we should not use memset to set a value other than 0 or true/false.
This will work for -1 on the system where values are stored in the 2's
complement but I am afraid of other architecture.
OK, I will remove all instances of comparing and setting variables of type
NullTestType to a value of -1.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
At Fri, 10 Nov 2017 09:34:57 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <5fcb1a9f-b4ad-119d-14c7-282c30c7f8d1@lab.ntt.co.jp>
Hi Amul.
On 2017/11/09 20:05, amul sul wrote:
On Mon, Nov 6, 2017 at 3:31 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/11/06 14:32, David Rowley wrote:
On 6 November 2017 at 17:30, Amit Langote wrote:
On 2017/11/03 13:32, David Rowley wrote:
On 31 October 2017 at 21:43, Amit Langote wrote:
[....]
Attached updated set of patches, including the fix to make the new pruning
code handle Boolean partitioning.I am getting following warning on mac os x:
Thanks for the review.
partition.c:1800:24: warning: comparison of constant -1 with
expression of type 'NullTestType'
(aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~~~~~~~~~~~~~ ^ ~~
partition.c:1932:25: warning: comparison of constant -1 with
expression of type 'NullTestType'
(aka 'enum NullTestType') is always false
[-Wtautological-constant-out-of-range-compare]
if (keynullness[i] == -1)
~~~~~~~~~~~~~~ ^ ~~
2 warnings generated.Comment for 0004 patch:
270 + /* -1 represents an invalid value of NullTestType. */
271 + memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType));I think we should not use memset to set a value other than 0 or true/false.
This will work for -1 on the system where values are stored in the 2's
complement but I am afraid of other architecture.OK, I will remove all instances of comparing and setting variables of type
NullTestType to a value of -1.
In 0002, bms_add_range has a bit naive-looking loop
+ while (wordnum <= uwordnum)
+ {
+ bitmapword mask = (bitmapword) ~0;
+
+ /* If working on the lower word, zero out bits below 'lower'. */
+ if (wordnum == lwordnum)
+ {
+ int lbitnum = BITNUM(lower);
+ mask >>= lbitnum;
+ mask <<= lbitnum;
+ }
+
+ /* Likewise, if working on the upper word, zero bits above 'upper' */
+ if (wordnum == uwordnum)
+ {
+ int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1);
+ mask <<= ushiftbits;
+ mask >>= ushiftbits;
+ }
+
+ a->words[wordnum++] |= mask;
+ }
Without some aggressive optimization, the loop takes most of the
time to check-and-jump for nothing especially with many
partitions and somewhat unintuitive.
The following uses a bit tricky bitmap operation but
is straightforward as a whole.
=====
/* fill the bits upper from BITNUM(lower) (0-based) of the first word */
a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);
/* fill up intermediate words */
while (wordnum < uwordnum)
a->words[wordnum++] = ~(bitmapword) 0;
/* fill up to BITNUM(upper) bit (0-based) of the last word */
a->workds[wordnum++] |=
(~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
=====
In 0003,
+match_clauses_to_partkey(RelOptInfo *rel,
...
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ continue;
If we call this function in both conjunction and disjunction
context (the latter is only in recursive case). constfalse ==
true means no need of any clauses for the former case.
Since (I think) just a list of RestrictInfo is expected to be
treated as a conjunction (it's quite doubious, though..), we
might be better call this for each subnodes of a disjunction. Or,
like match_clauses_to_index, we might be better provide
match_clause_to_partkey(rel, rinfo, contains_const), which
returns NULL if constfalse. (I'm not self-confident on this..)
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
I suppose it's better to leftop and rightop as is rather than
flipping over so that var is placed left-side. Does that make
things so complex?
+ * It the operator happens to be '<>', which is never listed
If?
+ if (!op_in_opfamily(expr_op, partopfamily))
+ {
+ Oid negator = get_negator(expr_op);
+
+ if (!OidIsValid(negator) ||
+ !op_in_opfamily(negator, partopfamily))
+ continue;
classify_partition_bounding_keys() checks the same thing by
checking whether the negator's strategy is
BTEquealStrategyNumber. (I'm not sure the operator is guaranteed
to be of btreee, though..) Aren't they needed to be in similar
way?
# In the function, "partkey strategy" and "operator strategy" are
# confusing..
+ AttrNumber attno;
This declaration might be better in a narrower scope.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, this is the second part of the review.
At Fri, 10 Nov 2017 12:30:00 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20171110.123000.151902771.horiguchi.kyotaro@lab.ntt.co.jp>
In 0002, bms_add_range has a bit naive-looking loop
In 0003,
In 0004,
The name get_partitions_from_clauses_guts(), it seems to me that
we usually use _internal for recursive part of some function. (I
have the same comment as David about the comment for
get_partition_from_clause())
About the same function:
Couldn't we get out in the fast path when clauses == NIL?
+ /* No constraints on the keys, so, return *all* partitions. */
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
This allows us to return immediately here. And just above this,
+ if (nkeys > 0 && !constfalse)
+ result = get_partitions_for_keys(relation, &keys);
+ else if (!constfalse)
Those two conditions are not orthogonal. Maybe something like
following seems more understantable.
if (!constfalse)
{
/* No constraints on the keys, so, return *all* partitions. */
if (nkeys == 0)
return bms_add_range(result, 0, partdesc->nparts - 1);result = get_partitions_for_keys(relation, &keys);
}
I'm not sure what is meant to be (formally) done here, but it
seems to me that OrExpr is assumed to be only at the top level of
the caluses. So the following (just an example, but meaningful
expression in this shpape must exists.) expression is perhaps
wrongly processed here.
CREATE TABLE p (a int) PARITION BY (a);
CREATE TABLE c1 PARTITION OF p FOR VALUES FROM (0) TO (10);
CREATE TABLE c2 PARTITION OF p FOR VALUES FROM (10) TO (20);
SELECT * FROM p WHERE a = 15 AND (a = 15 OR a = 5);
get_partitions_for_keys() returns both c1 and c2 and still
or_clauses here holds (a = 15 OR a = 5) and the function tries to
*add* partitions for a = 15 and a = 5 separetely.
I'd like to pause here.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Ooops! The following comment is wrong. Please ignore it.
At Fri, 10 Nov 2017 14:38:11 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20171110.143811.97616847.horiguchi.kyotaro@lab.ntt.co.jp>
Those two conditions are not orthogonal. Maybe something like
following seems more understantable.if (!constfalse)
{
/* No constraints on the keys, so, return *all* partitions. */
if (nkeys == 0)
return bms_add_range(result, 0, partdesc->nparts - 1);result = get_partitions_for_keys(relation, &keys);
}I'm not sure what is meant to be (formally) done here, but it
seems to me that OrExpr is assumed to be only at the top level of
the caluses. So the following (just an example, but meaningful
expression in this shpape must exists.) expression is perhaps
wrongly processed here.CREATE TABLE p (a int) PARITION BY (a);
CREATE TABLE c1 PARTITION OF p FOR VALUES FROM (0) TO (10);
CREATE TABLE c2 PARTITION OF p FOR VALUES FROM (10) TO (20);SELECT * FROM p WHERE a = 15 AND (a = 15 OR a = 5);
get_partitions_for_keys() returns both c1 and c2 and still
or_clauses here holds (a = 15 OR a = 5) and the function tries to
*add* partitions for a = 15 and a = 5 separetely.
This is working fine. Sorry for the bogus comment.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
At Fri, 10 Nov 2017 14:44:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20171110.144455.117208639.horiguchi.kyotaro@lab.ntt.co.jp>
Those two conditions are not orthogonal. Maybe something like
following seems more understantable.if (!constfalse)
{
/* No constraints on the keys, so, return *all* partitions. */
if (nkeys == 0)
return bms_add_range(result, 0, partdesc->nparts - 1);result = get_partitions_for_keys(relation, &keys);
}
So, the condition (!constfalse && nkeys == 0) cannot return
there. I'm badly confused by the variable name.
I couldn't find another reasonable structure using the current
classify_p_b_keys(), but could you add a comment like the
following as an example?
+ /*
+ * Ths function processes other than OR expressions and returns
+ * the excluded OR expressions in or_clauses
+ */
nkeys = classify_partition_bounding_keys(relation, clauses,
&keys, &constfalse,
&or_clauses);
/*
* Only look up in the partition decriptor if the query provides
* constraints on the keys at all.
*/
if (!constfalse)
{
if (nkey > 0)
result = get_partitions_for_keys(relation, &keys);
else
-+ /* No constraints on the keys, so, all partitions are passed. */
result = bms_add_range(result, 0, partdesc->nparts - 1);
}
+ /*
+ * We have a partition set for clauses not returned in or_clauses
+ * here. Conjuct the result of each OR clauses.
+ */
foreach(lc, or_clauses)
{
BoolExpr *or = (BoolExpr *) lfirst(lc);
ListCell *lc1;
Bitmapset *or_partset = NULL;
+ Assert(or_clause(or));
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10 November 2017 at 16:30, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
In 0002, bms_add_range has a bit naive-looking loop
+ while (wordnum <= uwordnum) + { + bitmapword mask = (bitmapword) ~0; + + /* If working on the lower word, zero out bits below 'lower'. */ + if (wordnum == lwordnum) + { + int lbitnum = BITNUM(lower); + mask >>= lbitnum; + mask <<= lbitnum; + } + + /* Likewise, if working on the upper word, zero bits above 'upper' */ + if (wordnum == uwordnum) + { + int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1); + mask <<= ushiftbits; + mask >>= ushiftbits; + } + + a->words[wordnum++] |= mask; + }Without some aggressive optimization, the loop takes most of the
time to check-and-jump for nothing especially with many
partitions and somewhat unintuitive.The following uses a bit tricky bitmap operation but
is straightforward as a whole.=====
/* fill the bits upper from BITNUM(lower) (0-based) of the first word */
a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);/* fill up intermediate words */
while (wordnum < uwordnum)
a->words[wordnum++] = ~(bitmapword) 0;/* fill up to BITNUM(upper) bit (0-based) of the last word */
a->workds[wordnum++] |=
(~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
=====
No objections here for making bms_add_range() perform better, but this
is not going to work when lwordnum == uwordnum. You'd need to special
case that. I didn't think it was worth the trouble, but maybe it is...
I assume the += should be |=.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Horiguchi-san,
Thanks for taking a look. Replying to all your emails here.
On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote:
In 0002, bms_add_range has a bit naive-looking loop
+ while (wordnum <= uwordnum) + { + bitmapword mask = (bitmapword) ~0; + + /* If working on the lower word, zero out bits below 'lower'. */ + if (wordnum == lwordnum) + { + int lbitnum = BITNUM(lower); + mask >>= lbitnum; + mask <<= lbitnum; + } + + /* Likewise, if working on the upper word, zero bits above 'upper' */ + if (wordnum == uwordnum) + { + int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1); + mask <<= ushiftbits; + mask >>= ushiftbits; + } + + a->words[wordnum++] |= mask; + }Without some aggressive optimization, the loop takes most of the
time to check-and-jump for nothing especially with many
partitions and somewhat unintuitive.The following uses a bit tricky bitmap operation but
is straightforward as a whole.=====
/* fill the bits upper from BITNUM(lower) (0-based) of the first word */
a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);/* fill up intermediate words */
while (wordnum < uwordnum)
a->words[wordnum++] = ~(bitmapword) 0;/* fill up to BITNUM(upper) bit (0-based) of the last word */
a->workds[wordnum++] |=
(~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
=====
Considering also the David's comment downthread, I will try to incorporate
this into bms_add_range().
In 0003,
+match_clauses_to_partkey(RelOptInfo *rel, ... + if (rinfo->pseudoconstant && + (IsA(clause, Const) && + ((((Const *) clause)->constisnull) || + !DatumGetBool(((Const *) clause)->constvalue)))) + { + *constfalse = true; + continue;If we call this function in both conjunction and disjunction
context (the latter is only in recursive case). constfalse ==
true means no need of any clauses for the former case.Since (I think) just a list of RestrictInfo is expected to be
treated as a conjunction (it's quite doubious, though..),
I think it makes sense to consider a list of RestrictInfo's, such as
baserestrictinfo, that is passed as input to match_clauses_to_partkey(),
to be mutually conjunctive for our purpose here.
we
might be better call this for each subnodes of a disjunction. Or,
like match_clauses_to_index, we might be better provide
match_clause_to_partkey(rel, rinfo, contains_const), which
returns NULL if constfalse. (I'm not self-confident on this..)
After reading your comment, I realized that it was wrong that the
recursive call to match_clauses_to_partkey() passed the arguments of an OR
clause all at once. That broke the assumption mentioned above that all of
the clauses in the list passed to match_clauses_to_partkey() are mutually
conjunctive. Instead, we must make a single-member list for each of the
OR clause's arguments and pass the same.
Then if we got constfalse for all of the OR's arguments, then we return
the constfalse=true to the original caller.
+ /* + * If no commutator exists, cannot flip the qual's args, + * so give up. + */ + if (!OidIsValid(expr_op)) + continue;I suppose it's better to leftop and rightop as is rather than
flipping over so that var is placed left-side. Does that make
things so complex?
Reason to do it that way is that the code placed far away (code in
partition.c that extracts constant values to use for pruning from matched
clauses) can always assume that the clauses determined to be useful for
partition-pruning always come in the 'partkey op constant' form.
+ * It the operator happens to be '<>', which is never listed
If?
Right, will fix.
+ if (!op_in_opfamily(expr_op, partopfamily)) + { + Oid negator = get_negator(expr_op); + + if (!OidIsValid(negator) || + !op_in_opfamily(negator, partopfamily)) + continue;classify_partition_bounding_keys() checks the same thing by
checking whether the negator's strategy is
BTEquealStrategyNumber. (I'm not sure the operator is guaranteed
to be of btreee, though..) Aren't they needed to be in similar
way?
You're right. The <>'s negator may not always be a btree operator. So,
we should add a check in match_clauses_to_partkey() that list or range
partitioning is in use, because only those require a btree operator
family. We now have hash partitioning, so need to be careful not to make
the assumption that all partitioning operators are from btree operator
families.
If match_clauses_to_partkey() accepts such a clause with a <> operator,
then classify_partition_bounding_keys() can rely that it will get the
desired btree operators to implement pruning for the same.
# In the function, "partkey strategy" and "operator strategy" are
# confusing..
I agree it would be better to make that clear using comments.
Partitioning strategy and operator strategies are intimately related.
List and range partitioning related optimizations will only work if the
clause operators are of valid btree strategies, hash partitioning
optimizations will only work if the operator in the matched clauses is a
valid hash equality operator.
+ AttrNumber attno;
This declaration might be better in a narrower scope.
Agreed, will move.
On 2017/11/10 14:38, Kyotaro HORIGUCHI wrote:
Hello, this is the second part of the review.
At Fri, 10 Nov 2017 12:30:00 +0900 , Kyotaro HORIGUCHI wrote:
In 0003,
In 0004,
The name get_partitions_from_clauses_guts(), it seems to me that
we usually use _internal for recursive part of some function. (I
have the same comment as David about the comment for
get_partition_from_clause())
OK, will replace _guts by _internal. Looking at the David's comments too.
About the same function:
Couldn't we get out in the fast path when clauses == NIL?
Actually get_partitions_for_clauses() won't be called at all if that were
true. I think I should add an Assert that clauses is not NIL.
+ /* No constraints on the keys, so, return *all* partitions. */ + result = bms_add_range(result, 0, partdesc->nparts - 1);This allows us to return immediately here. And just above this,
+ if (nkeys > 0 && !constfalse) + result = get_partitions_for_keys(relation, &keys); + else if (!constfalse)Those two conditions are not orthogonal. Maybe something like
following seems more understantable.if (!constfalse)
{
/* No constraints on the keys, so, return *all* partitions. */
if (nkeys == 0)
return bms_add_range(result, 0, partdesc->nparts - 1);result = get_partitions_for_keys(relation, &keys);
}
Agreed that your suggested rewrite of that portion of the code is easy to
read, but we cannot return yet in the nkeys == 0 case, as you also said
you found out. I quote your other reply further below.
I'm not sure what is meant to be (formally) done here, but it
seems to me that OrExpr is assumed to be only at the top level of
the caluses. So the following (just an example, but meaningful
expression in this shpape must exists.) expression is perhaps
wrongly processed here.CREATE TABLE p (a int) PARITION BY (a);
CREATE TABLE c1 PARTITION OF p FOR VALUES FROM (0) TO (10);
CREATE TABLE c2 PARTITION OF p FOR VALUES FROM (10) TO (20);SELECT * FROM p WHERE a = 15 AND (a = 15 OR a = 5);
get_partitions_for_keys() returns both c1 and c2 and still
or_clauses here holds (a = 15 OR a = 5) and the function tries to
*add* partitions for a = 15 and a = 5 separetely.I'd like to pause here.
[ ... ]
On 2017/11/10 14:44, Kyotaro HORIGUCHI wrote:
At Fri, 10 Nov 2017 14:38:11 +0900, Kyotaro HORIGUCHI wrote:
This is working fine. Sorry for the bogus comment.
I'd almost started looking around if something might be wrong after all. :)
On 2017/11/10 16:07, Kyotaro HORIGUCHI wrote:
At Fri, 10 Nov 2017 14:44:55 +0900, Kyotaro HORIGUCHI wrote:
Those two conditions are not orthogonal. Maybe something like
following seems more understantable.if (!constfalse)
{
/* No constraints on the keys, so, return *all* partitions. */
if (nkeys == 0)
return bms_add_range(result, 0, partdesc->nparts - 1);result = get_partitions_for_keys(relation, &keys);
}So, the condition (!constfalse && nkeys == 0) cannot return
there. I'm badly confused by the variable name.
Do you mean by 'constfalse'?
I couldn't find another reasonable structure using the current
classify_p_b_keys(), but could you add a comment like the
following as an example?
OK, will add comments explaining what's going on.
Will post the updated patches after also taking care of David's and Amul's
review comments upthread.
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 13 November 2017 at 22:46, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote:
In 0002, bms_add_range has a bit naive-looking loop
+ while (wordnum <= uwordnum) + { + bitmapword mask = (bitmapword) ~0; + + /* If working on the lower word, zero out bits below 'lower'. */ + if (wordnum == lwordnum) + { + int lbitnum = BITNUM(lower); + mask >>= lbitnum; + mask <<= lbitnum; + } + + /* Likewise, if working on the upper word, zero bits above 'upper' */ + if (wordnum == uwordnum) + { + int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1); + mask <<= ushiftbits; + mask >>= ushiftbits; + } + + a->words[wordnum++] |= mask; + }Without some aggressive optimization, the loop takes most of the
time to check-and-jump for nothing especially with many
partitions and somewhat unintuitive.The following uses a bit tricky bitmap operation but
is straightforward as a whole.=====
/* fill the bits upper from BITNUM(lower) (0-based) of the first word */
a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);/* fill up intermediate words */
while (wordnum < uwordnum)
a->words[wordnum++] = ~(bitmapword) 0;/* fill up to BITNUM(upper) bit (0-based) of the last word */
a->workds[wordnum++] |=
(~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
=====Considering also the David's comment downthread, I will try to incorporate
this into bms_add_range().
I've attached an implementation of the patch using this method.
I've also attached bitset.c which runs each through their paces. I'd
have expected Kyotaro's method to be faster, but gcc 7.2 with -O2
generates very slightly slower code. I didn't really check why. clang
seems to do a better job with it.
$ gcc -O2 bitset.c -o bitset && ./bitset
bms_add_range in 0.694254 (6.94254 ns per loop)
bms_add_range2 in 0.726643 (7.26643 ns per loop)
11111111111111111111111111111110
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
-------------
11111111111111111111111111111110
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
$ gcc --version
gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ clang -O2 bitset.c -o bitset && ./bitset
bms_add_range in 0.866554 (8.66554 ns per loop)
bms_add_range2 in 0.467138 (4.67138 ns per loop)
11111111111111111111111111111110
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
-------------
11111111111111111111111111111110
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
$ clang --version
clang version 4.0.1-6 (tags/RELEASE_401/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Probably just go with Kyotaro's idea (v2). I don't think this is worth
debating, I just wanted to show it's not that clear-cut.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
bms_add_range_v2.patchapplication/octet-stream; name=bms_add_range_v2.patchDownload
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index d4b82c6305..e5096e01a7 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -784,6 +784,78 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
return result;
}
+/*
+ * bms_add_range
+ * Add members in the range of 'lower' to 'upper' to the set.
+ *
+ * Note this could also be done by calling bms_add_member in a loop, however,
+ * using this function will be faster when the range is large as we work with
+ * at the bitmapword level rather than at bit level.
+ */
+Bitmapset *
+bms_add_range(Bitmapset *a, int lower, int upper)
+{
+ int lwordnum,
+ lbitnum,
+ uwordnum,
+ ushiftbits,
+ wordnum;
+
+ if (lower < 0 || upper < 0)
+ elog(ERROR, "negative bitmapset member not allowed");
+ if (lower > upper)
+ elog(ERROR, "lower range must not be above upper range");
+ uwordnum = WORDNUM(upper);
+
+ if (a == NULL)
+ {
+ a = (Bitmapset *) palloc0(BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ }
+
+ /* ensure we have enough words to store the upper bit */
+ else if (uwordnum >= a->nwords)
+ {
+ int oldnwords = a->nwords;
+ int i;
+
+ a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ /* zero out the enlarged portion */
+ for (i = oldnwords; i < a->nwords; i++)
+ a->words[i] = 0;
+ }
+
+ wordnum = lwordnum = WORDNUM(lower);
+
+ lbitnum = BITNUM(lower);
+ ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1);
+
+ /*
+ * Special case when lwordnum is the same as uwordnum we must perform the
+ * upper and lower masking on the word.
+ */
+ if (lwordnum == uwordnum)
+ {
+ a->words[lwordnum] |= ~(bitmapword) (((bitmapword) 1 << lbitnum) - 1)
+ & (~(bitmapword) 0) >> ushiftbits;
+ }
+ else
+ {
+ /* turn on lbitnum and all bits left of it */
+ a->words[wordnum++] |= ~(bitmapword) (((bitmapword) 1 << lbitnum) - 1);
+
+ /* turn on all bits for any intermediate words */
+ while (wordnum < uwordnum)
+ a->words[wordnum++] = ~(bitmapword) 0;
+
+ /* turn on upper's bit and all bits right of it. */
+ a->words[uwordnum] |= (~(bitmapword) 0) >> ushiftbits;
+ }
+
+ return a;
+}
+
/*
* bms_int_members - like bms_intersect, but left input is recycled
*/
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index aa3fb253c2..3b62a97775 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -90,6 +90,7 @@ extern bool bms_is_empty(const Bitmapset *a);
extern Bitmapset *bms_add_member(Bitmapset *a, int x);
extern Bitmapset *bms_del_member(Bitmapset *a, int x);
extern Bitmapset *bms_add_members(Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_add_range(Bitmapset *a, int lower, int upper);
extern Bitmapset *bms_int_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_del_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_join(Bitmapset *a, Bitmapset *b);
Hi David.
On 2017/11/14 13:00, David Rowley wrote:
On 13 November 2017 at 22:46, Amit Langote wrote:
On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote:
The following uses a bit tricky bitmap operation but
is straightforward as a whole.=====
/* fill the bits upper from BITNUM(lower) (0-based) of the first word */
a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);/* fill up intermediate words */
while (wordnum < uwordnum)
a->words[wordnum++] = ~(bitmapword) 0;/* fill up to BITNUM(upper) bit (0-based) of the last word */
a->workds[wordnum++] |=
(~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
=====Considering also the David's comment downthread, I will try to incorporate
this into bms_add_range().I've attached an implementation of the patch using this method.
[ ... ]
Probably just go with Kyotaro's idea (v2). I don't think this is worth
debating, I just wanted to show it's not that clear-cut.
Thanks. I have incorporated the v2 patch in my local repository. I'm
still working through some of the review comments and will be able to
post a new version no later than tomorrow, including support for the new
hash partitioning.
Thanks,
Amit
Thanks for the interesting test, David.
At Tue, 14 Nov 2017 17:00:12 +1300, David Rowley <david.rowley@2ndquadrant.com> wrote in <CAKJS1f-1Lc_b=Y1iicPQzvUgSn1keHSgmRqLuOGq_VR6M==zbw@mail.gmail.com>
On 13 November 2017 at 22:46, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote:
Without some aggressive optimization, the loop takes most of the
time to check-and-jump for nothing especially with many
partitions and somewhat unintuitive.The following uses a bit tricky bitmap operation but
is straightforward as a whole.=====
/* fill the bits upper from BITNUM(lower) (0-based) of the first word */
a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);
'+='.. ^^;
/* fill up intermediate words */
while (wordnum < uwordnum)
a->words[wordnum++] = ~(bitmapword) 0;/* fill up to BITNUM(upper) bit (0-based) of the last word */
a->workds[wordnum++] |=
(~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
=====Considering also the David's comment downthread, I will try to incorporate
this into bms_add_range().I've attached an implementation of the patch using this method.
I've also attached bitset.c which runs each through their paces. I'd
have expected Kyotaro's method to be faster, but gcc 7.2 with -O2
generates very slightly slower code. I didn't really check why. clang
seems to do a better job with it.
..
$ gcc -O2 bitset.c -o bitset && ./bitset
bms_add_range in 0.694254 (6.94254 ns per loop)
bms_add_range2 in 0.726643 (7.26643 ns per loop)
..
$ gcc --version
gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Hmm bms_add_range doesn't seem getting so aggressive optimization
but I had a similar result.
Looking the output of gcc -S, I found that bms_add_range() is
embedded in main(). (gcc 7.1.0) It's not surprizing after finding
that but.. Anyway I added __attribute((noinline)) to the two
functions and got the following result.
bms_add_range in 1.24 (12.4 ns per loop)
bms_add_range2 in 0.8 (8 ns per loop)
It seems reasonable.
$ clang -O2 bitset.c -o bitset && ./bitset
bms_add_range in 0.866554 (8.66554 ns per loop)
bms_add_range2 in 0.467138 (4.67138 ns per loop)
..
$ clang --version
clang version 4.0.1-6 (tags/RELEASE_401/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/binProbably just go with Kyotaro's idea (v2). I don't think this is worth
debating, I just wanted to show it's not that clear-cut.
I agree that it's not so clear-cut.
regard,
--
Kyotaro Horiguchi
NTT Open Source Software Center
On 16 November 2017 at 15:54, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Anyway I added __attribute((noinline)) to the two
functions and got the following result.bms_add_range in 1.24 (12.4 ns per loop)
bms_add_range2 in 0.8 (8 ns per loop)
I see similar here with __attribute((noinline)). Thanks for
investigating that. Your way is clearly better. Thanks for suggesting
it.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
On 2017/11/08 13:44, David Rowley wrote:
On 7 November 2017 at 01:52, David Rowley <david.rowley@2ndquadrant.com> wrote:
Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)
Hi Amit,
I had another look over this today. Apologies if any of the review seems petty.
Thanks a lot for the review.
Here goes:
1. If test seems to be testing for a child that's a partitioned table,
but is testing for a non-NULL part_scheme./*
* If childrel is itself partitioned, add it and its partitioned
* children to the list being propagated up to the root rel.
*/
if (childrel->part_scheme && rel->part_scheme)Should this code use IS_PARTITIONED_REL() instead? Seems a bit strange
to test for a NULL part_scheme
I guess that makes sense, done.
2. There's a couple of mistakes in my bms_add_range() code. I've
attached bms_add_range_fix.patch. Can you apply this to your tree?
Thanks. I have used your bms_add_range_v2.patch that you sent earlier
today and listed both your and Horiguchi-san's names as author.
3. This assert seems to be Asserting the same thing twice:
Assert(rel->live_partitioned_rels != NIL &&
list_length(rel->live_partitioned_rels) > 0);A List with length == 0 is always NIL.
You're right. I changed it to:
Assert(list_length(rel->live_partitioned_rels) >= 1);
4. get_partitions_from_clauses(), can you comment why you perform the
list_concat() there.I believe this is there so that the partition bound from the parent is
passed down to the child so that we can properly eliminate all child
partitions when the 2nd level of partitioning is using the same
partition key as the 1st level. I think this deserves a paragraph of
comment to explain this.
Yes, that's the intent. I implemented it as a solution to fix a problem
that was reported upthread, whereby the default partition pruning didn't
work as desired. I tried to explain it in the following email:
/messages/by-id/8499324c-8a33-4be7-9d23-7e6a95e60ddf@lab.ntt.co.jp
But, since I devised it as a solution to get the desired behavior for the
default partition, I modified the code to include partition constraint
clauses to do it only when the table has a default partition in the first
place. Doing it always is an overkill. Please see the comment added
nearby if it now helps make sense of what's going on.
5. Please add a comment to explain what's going on here in
classify_partition_bounding_keys()if (partattno == 0)
{
partexpr = lfirst(partexprs_item);
partexprs_item = lnext(partexprs_item);
}Looks like, similar to index expressions, that partition expressions
are attno 0 to mark to consume the next expression from the list.
Right.
Does this need validation that there are enough partexprs_item items
like what is done in get_range_key_properties()? Or is this validated
somewhere else?
Yeah, I added the check as follows:
+ /* Set partexpr if needed. */
if (partattno == 0)
{
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
partexpr = lfirst(partexprs_item);
partexprs_item = lnext(partexprs_item);
6. Comment claims the if test will test something which it does not
seem to test for:/*
* Redundant key elimination using btree-semantics based tricks.
*
* Only list and range partitioning use btree operator semantics, so
* skip otherwise. Also, if there are expressions whose value is yet
* unknown, skip this step, because we need to compare actual values
* below.
*/
memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
if (partkey->strategy == PARTITION_STRATEGY_LIST ||
partkey->strategy == PARTITION_STRATEGY_RANGE)I was expecting this to be skipped when the clauses contained a
non-const, but it does not seem to.
Fixed the comment. Actually we might end up with non-consts here if
executor invokes it, so the downstream code is in position to handle them,
skipping any optimizations that depend on constant values being available.
There are actually even cases when the planner wouldn't mind calling here
even if the matched clauses contained non-const operands as long as there
are at least some constants available.
7. Should be "compare them"
/*
* If the leftarg and rightarg clauses' constants are both of the type
* expected by "op" clause's operator, then compare then using the
* latter's comparison function.
*/But if I look at the code "compare then using the latter's comparison
function." is not true, it seems to use op's comparison function not
rightarg's. With most of the calls op and rightarg are the same, but
not all of them. The function shouldn't make that assumption even if
the args op was always the same as rightarg.
Rearranged the code in partition_cmp_args() a bit and added more
clarifying comments.
8. remove_redundant_clauses() needs an overview comment of what the
function does.
Done.
9. The comment should explain what we would do in the case of key < 3
AND key <= 2 using some examples./* try to keep only one of <, <= */
Done.
10. Wondering why this loop runs backward?
for (s = BTMaxStrategyNumber; --s >= 0;)
Why not just:
for (s = 0; s < BTMaxStrategyNumber; s++)
I can't see a special reason for it to run backward. It seems unusual,
but if there's a good reason that I've failed to realise then it's
maybe worth a comment.
Hmm, no special reason. So, done the other way. I actually brought this
redundant key logic elimination logic over from nbtutils.c:
_bt_preprocess_keys() and the loop runs that way over there.
11. Pleae comment on why *constfalse = true is set here:
if (!chk || s == (BTEqualStrategyNumber - 1))
continue;if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk,
&test_result))
{
if (!test_result)
{
*constfalse = true;
return;
}
/* discard the redundant key. */
xform[s] = NULL;
}Looks like we'd hit this in a case such as: WHERE key = 1 AND key > 1.
Right. Added a comment with example.
Also please add a comment when discarding the redundant key maybe
explain that equality is more useful than the other strategies when
there's an overlap.
Done, too.
Please find attached updated patch set. There are significant changes in
this version as described below, including the support for hash
partitioned tables.
Earlier today, I reported [1]/messages/by-id/ba7aaeb1-4399-220e-70b4-62eade1522d0@lab.ntt.co.jp what looks to me like a bug in how default
partition's constraint gets generated and how that sometimes makes
constraint exclusion mistakenly prune a default partition. I have
included the patches I posted there in this series too. They are ahead in
the list.
So attached patches are now as follows:
0001-Add-default-partition-case-in-inheritance-testing.patch
0002-Tweak-default-range-partition-s-constraint-a-little.patch
Patches at [1]/messages/by-id/ba7aaeb1-4399-220e-70b4-62eade1522d0@lab.ntt.co.jp.
0003-Add-new-tests-for-partition-pruning.patch
Tests. Mostly unchanged from the last version.
0004-Add-a-bms_add_range.patch
Uses the bms_add_range_v2 patch.
0005-Planner-side-changes-for-partition-pruning.patch
Fixed some issues with how OR clauses were matched and the code for
checking if the operator is partitioning compatible is no longer in the
planner code, instead it's now only in partition.c. Other cosmetic
improvements including those that resulted from the review comments.
0006-Implement-get_partitions_from_clauses.patch
Several bug fixes and many cosmetic improvements including improved
commentary. Per Amul's comment upthread, the patch no longer depends on
using value -1 to denote an invalid value of NulltestType enum. In fact,
it doesn't use NulltestType type variables at all.
0007-Some-interface-changes-for-partition_bound_-cmp-bsea.patch
No changes.
0008-Implement-get_partitions_for_keys.patch
Support for hash partitioning and tests for the same. Also, since
update/delete on partitioned tables still depend on constraint exclusion
for pruning, fix things such that get_relation_constraints includes
partition constraints in its result only for non-select queries (for
selects we have the new pruning code). Other bug fixes.
Thanks,
Amit
[1]: /messages/by-id/ba7aaeb1-4399-220e-70b4-62eade1522d0@lab.ntt.co.jp
/messages/by-id/ba7aaeb1-4399-220e-70b4-62eade1522d0@lab.ntt.co.jp
Attachments:
0001-Add-default-partition-case-in-inheritance-testing.patchtext/plain; charset=UTF-8; name=0001-Add-default-partition-case-in-inheritance-testing.patchDownload
From 7b7891a416e34064d6fcd99f35b759dc746fbc0e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 17 Nov 2017 14:00:42 +0900
Subject: [PATCH 1/8] Add default partition case in inheritance testing
---
src/test/regress/expected/inherit.out | 29 +++++++++++++++++++----------
src/test/regress/sql/inherit.sql | 9 +++++----
2 files changed, 24 insertions(+), 14 deletions(-)
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..a202caeb25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1853,13 +1853,14 @@ drop table range_list_parted;
-- check that constraint exclusion is able to cope with the partition
-- constraint emitted for multi-column range partitioned tables
create table mcrparted (a int, b int, c int) partition by range (a, abs(b), c);
+create table mcrparted_def partition of mcrparted default;
create table mcrparted0 partition of mcrparted for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
create table mcrparted1 partition of mcrparted for values from (1, 1, 1) to (10, 5, 10);
create table mcrparted2 partition of mcrparted for values from (10, 5, 10) to (10, 10, 10);
create table mcrparted3 partition of mcrparted for values from (11, 1, 1) to (20, 10, 10);
create table mcrparted4 partition of mcrparted for values from (20, 10, 10) to (20, 20, 20);
create table mcrparted5 partition of mcrparted for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
-explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0
+explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0, mcrparted_def
QUERY PLAN
------------------------------
Append
@@ -1867,7 +1868,7 @@ explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0
Filter: (a = 0)
(3 rows)
-explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1
+explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1, mcrparted_def
QUERY PLAN
---------------------------------------------
Append
@@ -1875,7 +1876,7 @@ explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scan
Filter: ((a = 10) AND (abs(b) < 5))
(3 rows)
-explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2
+explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2, mcrparted_def
QUERY PLAN
---------------------------------------------
Append
@@ -1883,11 +1884,13 @@ explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scan
Filter: ((a = 10) AND (abs(b) = 5))
-> Seq Scan on mcrparted2
Filter: ((a = 10) AND (abs(b) = 5))
-(5 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: ((a = 10) AND (abs(b) = 5))
+(7 rows)
explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all partitions
- QUERY PLAN
-------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on mcrparted0
Filter: (abs(b) = 5)
@@ -1899,7 +1902,9 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-(11 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: (abs(b) = 5)
+(13 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
@@ -1917,7 +1922,9 @@ explain (costs off) select * from mcrparted where a > -1; -- scans all partition
Filter: (a > '-1'::integer)
-> Seq Scan on mcrparted5
Filter: (a > '-1'::integer)
-(13 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: (a > '-1'::integer)
+(15 rows)
explain (costs off) select * from mcrparted where a = 20 and abs(b) = 10 and c > 10; -- scans mcrparted4
QUERY PLAN
@@ -1927,7 +1934,7 @@ explain (costs off) select * from mcrparted where a = 20 and abs(b) = 10 and c >
Filter: ((c > 10) AND (a = 20) AND (abs(b) = 10))
(3 rows)
-explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mcrparted3, mcrparte4, mcrparte5
+explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mcrparted3, mcrparte4, mcrparte5, mcrparted_def
QUERY PLAN
-----------------------------------------
Append
@@ -1937,7 +1944,9 @@ explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mc
Filter: ((c > 20) AND (a = 20))
-> Seq Scan on mcrparted5
Filter: ((c > 20) AND (a = 20))
-(7 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: ((c > 20) AND (a = 20))
+(9 rows)
drop table mcrparted;
-- check that partitioned table Appends cope with being referenced in
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc0f5..c71febffc2 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -664,19 +664,20 @@ drop table range_list_parted;
-- check that constraint exclusion is able to cope with the partition
-- constraint emitted for multi-column range partitioned tables
create table mcrparted (a int, b int, c int) partition by range (a, abs(b), c);
+create table mcrparted_def partition of mcrparted default;
create table mcrparted0 partition of mcrparted for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
create table mcrparted1 partition of mcrparted for values from (1, 1, 1) to (10, 5, 10);
create table mcrparted2 partition of mcrparted for values from (10, 5, 10) to (10, 10, 10);
create table mcrparted3 partition of mcrparted for values from (11, 1, 1) to (20, 10, 10);
create table mcrparted4 partition of mcrparted for values from (20, 10, 10) to (20, 20, 20);
create table mcrparted5 partition of mcrparted for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
-explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0
-explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1
-explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2
+explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0, mcrparted_def
+explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1, mcrparted_def
+explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2, mcrparted_def
explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all partitions
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
explain (costs off) select * from mcrparted where a = 20 and abs(b) = 10 and c > 10; -- scans mcrparted4
-explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mcrparted3, mcrparte4, mcrparte5
+explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mcrparted3, mcrparte4, mcrparte5, mcrparted_def
drop table mcrparted;
-- check that partitioned table Appends cope with being referenced in
--
2.11.0
0002-Tweak-default-range-partition-s-constraint-a-little.patchtext/plain; charset=UTF-8; name=0002-Tweak-default-range-partition-s-constraint-a-little.patchDownload
From 6ba00b0844bf1064d4b6a996425e7b329c179356 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 31 Oct 2017 16:26:55 +0900
Subject: [PATCH 2/8] Tweak default range partition's constraint a little
When using as a predicate, it's useful for it explicitly say that
the default range partition might contain nulls, because non-default
range partitions can't.
---
src/backend/catalog/partition.c | 29 +++++++++++++++++++++++------
src/test/regress/expected/inherit.out | 12 ++++++++----
src/test/regress/expected/update.out | 2 +-
3 files changed, 32 insertions(+), 11 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 67d4c2a09b..5c13793481 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2133,12 +2133,29 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
if (or_expr_args != NIL)
{
- /* OR all the non-default partition constraints; then negate it */
- result = lappend(result,
- list_length(or_expr_args) > 1
- ? makeBoolExpr(OR_EXPR, or_expr_args, -1)
- : linitial(or_expr_args));
- result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
+ Expr *other_parts_constr;
+
+ /*
+ * Combine the constraints obtained for non-default partitions
+ * using OR. As requested, each of the OR's args doesn't include
+ * the NOT NULL test for partition keys (which is to avoid its
+ * useless repetition). Add the same now.
+ */
+ other_parts_constr =
+ makeBoolExpr(AND_EXPR,
+ lappend(get_range_nulltest(key),
+ list_length(or_expr_args) > 1
+ ? makeBoolExpr(OR_EXPR, or_expr_args,
+ -1)
+ : linitial(or_expr_args)),
+ -1);
+
+ /*
+ * Finally, the default partition contains everything *NOT*
+ * contained in the non-default partitions.
+ */
+ result = list_make1(makeBoolExpr(NOT_EXPR,
+ list_make1(other_parts_constr), -1));
}
return result;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a202caeb25..fac7b62f9c 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1861,12 +1861,14 @@ create table mcrparted3 partition of mcrparted for values from (11, 1, 1) to (20
create table mcrparted4 partition of mcrparted for values from (20, 10, 10) to (20, 20, 20);
create table mcrparted5 partition of mcrparted for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0, mcrparted_def
- QUERY PLAN
-------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on mcrparted0
Filter: (a = 0)
-(3 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: (a = 0)
+(5 rows)
explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1, mcrparted_def
QUERY PLAN
@@ -1874,7 +1876,9 @@ explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scan
Append
-> Seq Scan on mcrparted1
Filter: ((a = 10) AND (abs(b) < 5))
-(3 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: ((a = 10) AND (abs(b) < 5))
+(5 rows)
explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2, mcrparted_def
QUERY PLAN
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index a4fe96112e..b69ceaa75e 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -227,7 +227,7 @@ create table part_def partition of range_parted default;
a | text | | | | extended | |
b | integer | | | | plain | |
Partition of: range_parted DEFAULT
-Partition constraint: (NOT (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20))))
+Partition constraint: (NOT ((a IS NOT NULL) AND (b IS NOT NULL) AND (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20)))))
insert into range_parted values ('c', 9);
-- ok
--
2.11.0
0003-Add-new-tests-for-partition-pruning-v12.patchtext/plain; charset=UTF-8; name=0003-Add-new-tests-for-partition-pruning-v12.patchDownload
From 3d482714d6a410e0872565c3534bf52ca36bfb37 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 3/8] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 1087 +++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 155 +++++
4 files changed, 1244 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..963561dbfe
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,1087 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_10
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_30
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a)::numeric = '1'::numeric)
+(31 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 10)
+(9 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_2
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_30
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 10)
+(23 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a < 15)
+(9 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 15)
+(17 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(17 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text < 'ab'::text) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+(5 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on rlp_default_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a is not null;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3abcd
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3efgh
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_10
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_30
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_default
+ Filter: (a IS NOT NULL)
+(29 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp5_1
+ Filter: (a > 30)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 30)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+----------------------------------
+ Append
+ -> Seq Scan on rlp_default_30
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_30
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 31)
+(29 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_10
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_null
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(25 rows)
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 20) AND (a < 27))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 29;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a = 29)
+(3 rows)
+
+explain (costs off) select * from rlp where a >= 29;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_1
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_30
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_default
+ Filter: (a >= 29)
+(11 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default_10
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 1) AND (a >= 15))
+(23 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (2, minvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+explain (costs off) select * from mc2p where a < 2;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc2p0
+ Filter: (a < 2)
+ -> Seq Scan on mc2p1
+ Filter: (a < 2)
+ -> Seq Scan on mc2p2
+ Filter: (a < 2)
+ -> Seq Scan on mc2p_default
+ Filter: (a < 2)
+(9 rows)
+
+explain (costs off) select * from mc2p where a = 2 and b < 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on mc2p3
+ Filter: ((b < 1) AND (a = 2))
+(3 rows)
+
+explain (costs off) select * from mc2p where a > 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc2p2
+ Filter: (a > 1)
+ -> Seq Scan on mc2p3
+ Filter: (a > 1)
+ -> Seq Scan on mc2p4
+ Filter: (a > 1)
+ -> Seq Scan on mc2p5
+ Filter: (a > 1)
+ -> Seq Scan on mc2p_default
+ Filter: (a > 1)
+(11 rows)
+
+explain (costs off) select * from mc2p where a = 1 and b > 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on mc2p2
+ Filter: ((b > 1) AND (a = 1))
+(3 rows)
+
+-- boolean partitioning
+create table boolpart (a bool) partition by list (a);
+create table boolpart_default partition of boolpart default;
+create table boolpart_t partition of boolpart for values in ('true');
+create table boolpart_f partition of boolpart for values in ('false');
+explain (costs off) select * from boolpart where a in (true, false);
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a = ANY ('{t,f}'::boolean[]))
+ -> Seq Scan on boolpart_t
+ Filter: (a = ANY ('{t,f}'::boolean[]))
+(5 rows)
+
+explain (costs off) select * from boolpart where a = false;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (NOT a)
+ -> Seq Scan on boolpart_t
+ Filter: (NOT a)
+ -> Seq Scan on boolpart_default
+ Filter: (NOT a)
+(7 rows)
+
+explain (costs off) select * from boolpart where not a = false;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: a
+ -> Seq Scan on boolpart_t
+ Filter: a
+ -> Seq Scan on boolpart_default
+ Filter: a
+(7 rows)
+
+explain (costs off) select * from boolpart where a is true or a is not true;
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+ -> Seq Scan on boolpart_t
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+ -> Seq Scan on boolpart_default
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+(7 rows)
+
+explain (costs off) select * from boolpart where a is not true;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a IS NOT TRUE)
+ -> Seq Scan on boolpart_t
+ Filter: (a IS NOT TRUE)
+ -> Seq Scan on boolpart_default
+ Filter: (a IS NOT TRUE)
+(7 rows)
+
+explain (costs off) select * from boolpart where a is not true and a is not false;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
+ -> Seq Scan on boolpart_t
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
+ -> Seq Scan on boolpart_default
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
+(7 rows)
+
+explain (costs off) select * from boolpart where a is unknown;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a IS UNKNOWN)
+ -> Seq Scan on boolpart_t
+ Filter: (a IS UNKNOWN)
+ -> Seq Scan on boolpart_default
+ Filter: (a IS UNKNOWN)
+(7 rows)
+
+explain (costs off) select * from boolpart where a is not unknown;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a IS NOT UNKNOWN)
+ -> Seq Scan on boolpart_t
+ Filter: (a IS NOT UNKNOWN)
+ -> Seq Scan on boolpart_default
+ Filter: (a IS NOT UNKNOWN)
+(7 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index aa5e6af621..38dfe618b5 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 3866314a92..17d88e5ca9 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..9dfcbe1e70
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,155 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null;
+explain (costs off) select * from rlp where a is not null;
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+explain (costs off) select * from rlp where a = 29;
+explain (costs off) select * from rlp where a >= 29;
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (2, minvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+
+explain (costs off) select * from mc2p where a < 2;
+explain (costs off) select * from mc2p where a = 2 and b < 1;
+explain (costs off) select * from mc2p where a > 1;
+explain (costs off) select * from mc2p where a = 1 and b > 1;
+
+-- boolean partitioning
+create table boolpart (a bool) partition by list (a);
+create table boolpart_default partition of boolpart default;
+create table boolpart_t partition of boolpart for values in ('true');
+create table boolpart_f partition of boolpart for values in ('false');
+
+explain (costs off) select * from boolpart where a in (true, false);
+explain (costs off) select * from boolpart where a = false;
+explain (costs off) select * from boolpart where not a = false;
+explain (costs off) select * from boolpart where a is true or a is not true;
+explain (costs off) select * from boolpart where a is not true;
+explain (costs off) select * from boolpart where a is not true and a is not false;
+explain (costs off) select * from boolpart where a is unknown;
+explain (costs off) select * from boolpart where a is not unknown;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
--
2.11.0
0004-Add-a-bms_add_range-v12.patchtext/plain; charset=UTF-8; name=0004-Add-a-bms_add_range-v12.patchDownload
From 9ee4249132238b99227f416abcfef14a9de9c8cd Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 14 Nov 2017 16:05:52 +0900
Subject: [PATCH 4/8] Add a bms_add_range()
Authors: David Rowley, Kyotaro Horiguchi
---
src/backend/nodes/bitmapset.c | 72 +++++++++++++++++++++++++++++++++++++++++++
src/include/nodes/bitmapset.h | 1 +
2 files changed, 73 insertions(+)
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index d4b82c6305..e5096e01a7 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -785,6 +785,78 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
}
/*
+ * bms_add_range
+ * Add members in the range of 'lower' to 'upper' to the set.
+ *
+ * Note this could also be done by calling bms_add_member in a loop, however,
+ * using this function will be faster when the range is large as we work with
+ * at the bitmapword level rather than at bit level.
+ */
+Bitmapset *
+bms_add_range(Bitmapset *a, int lower, int upper)
+{
+ int lwordnum,
+ lbitnum,
+ uwordnum,
+ ushiftbits,
+ wordnum;
+
+ if (lower < 0 || upper < 0)
+ elog(ERROR, "negative bitmapset member not allowed");
+ if (lower > upper)
+ elog(ERROR, "lower range must not be above upper range");
+ uwordnum = WORDNUM(upper);
+
+ if (a == NULL)
+ {
+ a = (Bitmapset *) palloc0(BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ }
+
+ /* ensure we have enough words to store the upper bit */
+ else if (uwordnum >= a->nwords)
+ {
+ int oldnwords = a->nwords;
+ int i;
+
+ a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ /* zero out the enlarged portion */
+ for (i = oldnwords; i < a->nwords; i++)
+ a->words[i] = 0;
+ }
+
+ wordnum = lwordnum = WORDNUM(lower);
+
+ lbitnum = BITNUM(lower);
+ ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1);
+
+ /*
+ * Special case when lwordnum is the same as uwordnum we must perform the
+ * upper and lower masking on the word.
+ */
+ if (lwordnum == uwordnum)
+ {
+ a->words[lwordnum] |= ~(bitmapword) (((bitmapword) 1 << lbitnum) - 1)
+ & (~(bitmapword) 0) >> ushiftbits;
+ }
+ else
+ {
+ /* turn on lbitnum and all bits left of it */
+ a->words[wordnum++] |= ~(bitmapword) (((bitmapword) 1 << lbitnum) - 1);
+
+ /* turn on all bits for any intermediate words */
+ while (wordnum < uwordnum)
+ a->words[wordnum++] = ~(bitmapword) 0;
+
+ /* turn on upper's bit and all bits right of it. */
+ a->words[uwordnum] |= (~(bitmapword) 0) >> ushiftbits;
+ }
+
+ return a;
+}
+
+/*
* bms_int_members - like bms_intersect, but left input is recycled
*/
Bitmapset *
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index aa3fb253c2..3b62a97775 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -90,6 +90,7 @@ extern bool bms_is_empty(const Bitmapset *a);
extern Bitmapset *bms_add_member(Bitmapset *a, int x);
extern Bitmapset *bms_del_member(Bitmapset *a, int x);
extern Bitmapset *bms_add_members(Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_add_range(Bitmapset *a, int lower, int upper);
extern Bitmapset *bms_int_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_del_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_join(Bitmapset *a, Bitmapset *b);
--
2.11.0
0005-Planner-side-changes-for-partition-pruning-v12.patchtext/plain; charset=UTF-8; name=0005-Planner-side-changes-for-partition-pruning-v12.patchDownload
From 98fe3598a4b2a97f4390ee2e771e7b4cef5f6d79 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 5/8] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning (although
as of this commit this contains *all* appinfos as mentioned
below).
5. Some code in try_partition_wise_join in to handle the
possibility that a partition RelOptInfo may not have the basic
information set (note that as noted in 0, set_append_rel_size
now sets such information for only the *live* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, get_partitions_from_clauses
would returns without pruning any partitions. In most cases, it's
obvious in the planner that a set of clauses identified as matching
the partition key don't contain the constant values right away, in
which case, there is no need to call get_partitions_from_clauses
right away. Instead, it should be deferred to another piece of code
which can receive the above list of clauses and runs at a time when
the constant values become available.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 20 ++
src/backend/optimizer/path/allpaths.c | 608 +++++++++++++++++++++++++++-------
src/backend/optimizer/path/indxpath.c | 3 -
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 19 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 101 ++++++
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
11 files changed, 685 insertions(+), 133 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5c13793481..1fce54e432 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1528,6 +1528,26 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+
+ Assert(partclauses != NIL);
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ return result;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 906d08ab37..6b087ec15f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,12 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -846,6 +857,399 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * in rel->baserestrictinfo. An empty list is returned if no matching
+ * partitions were found.
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *partclauses;
+ bool contains_const,
+ constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(root, rel,
+ list_copy(rel->baserestrictinfo),
+ &contains_const,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ Bitmapset *partindexes;
+ List *result = NIL;
+ int i;
+
+ /*
+ * If we have matched clauses that contain at least one constant
+ * operand, then use these to prune partitions.
+ */
+ if (partclauses != NIL && contains_const)
+ partindexes = get_partitions_from_clauses(parent, rel->relid,
+ partclauses);
+
+ /*
+ * Else there are no clauses that are useful to prune any paritions,
+ * so we must scan all partitions.
+ */
+ else
+ partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+
+ result = lappend(result, appinfo);
+ }
+
+ /* Record which partitions must be scanned. */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+ }
+
+ return NIL;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause
+ * must be an operator clause of the form (partkey op const) or (const op
+ * partkey); the latter only if a suitable commutator exists. Furthermore,
+ * the operator must be strict and its input collation must match the partition
+ * collation. The aforementioned "const" means any expression that doesn't
+ * involve a volatile function or a Var of this relation. We allow Vars
+ * belonging to other relations (for example, if the clause is a join clause),
+ * but they are treated as parameters whose values are not known now, so cannot
+ * be used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join clauses
+ * appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ *
+ * If clauses contains at least one constant operand or a Nullness test,
+ * *contains_const is set so that the caller can pass the clauses to the
+ * partitioning module right away.
+ *
+ * If the list contains a pseudo-constant RestrictInfo with constant false
+ * value, *constfalse is set.
+ */
+static List *
+match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Node *member = lfirst(lc);
+ Expr *clause;
+ int i;
+
+ if (IsA(member, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) member;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+ else
+ clause = (Expr *) member;
+
+ /*
+ * For a BoolExpr, we should try to match each of its args with the
+ * partition key as described below for each type.
+ */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ /*
+ * For each of OR clause's args, call this function
+ * recursively with a given arg as the only member in the
+ * input list and see if it's returned as matching the
+ * partition key. Add the OR clause to the result iff at
+ * least one of its args contain a matching clause.
+ */
+ BoolExpr *orclause = (BoolExpr *) clause;
+ ListCell *lc1;
+ bool arg_matches_key = false,
+ matched_arg_contains_const = false,
+ all_args_constfalse = true;
+
+ foreach (lc1, orclause->args)
+ {
+ Node *arg = lfirst(lc1);
+ bool contains_const1,
+ constfalse1;
+
+ if (match_clauses_to_partkey(root, rel, list_make1(arg),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ arg_matches_key = true;
+ matched_arg_contains_const = contains_const1;
+ }
+
+ /* We got at least one arg that is not constant false. */
+ if (!constfalse1)
+ all_args_constfalse = false;
+ }
+
+ if (arg_matches_key)
+ {
+ result = lappend(result, clause);
+ *contains_const = matched_arg_contains_const;
+ }
+
+ /* OR clause is "constant false" if all of its args are. */
+ *constfalse = all_args_constfalse;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Since the clause is itself implicitly ANDed with other
+ * clauses in the input list, queue the args to be processed
+ * later as if they were part of the original input list.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the clauses matches the partition key and add it to
+ * the result list if other things such as operator input
+ * collation, strictness, etc. look fine.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning.
+ */
+ result = lappend(result, clause);
+
+ if (!*contains_const)
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* OK to add to the result. */
+ result = lappend(result, clause);
+ if (IsA(estimate_expression_value(root, rightop), Const))
+ *contains_const = true;
+ else
+ *contains_const = false;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* A Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Certain Boolean conditions have a special shape, which we
+ * accept if the partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ /*
+ * Only accept those for pruning that appear to be
+ * IS [NOT] TRUE/FALSE.
+ */
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+ Expr *arg = btest->arg;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN &&
+ equal((Node *) arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var))
+ {
+ if (equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ }
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1264,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1278,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1315,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1328,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,85 +1338,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1164,6 +1508,19 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ * Note that since rel itself (the parent) might just be a union all
+ * subquery, in which case, there is nothing to do here.
+ */
+ if (IS_PARTITIONED_REL(childrel) && IS_PARTITIONED_REL(rel))
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1259,14 +1616,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1337,43 +1709,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1390,17 +1759,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 18f6bafcdd..32de48128f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -40,9 +40,6 @@
#include "utils/selfuncs.h"
-#define IsBooleanOpfamily(opfamily) \
- ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
-
#define IndexCollMatchesExprColl(idxcollation, exprcollation) \
((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 453f25964a..b491fb9099 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f6b8bbf5fa..aaa342c2ab 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6192,14 +6192,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index cb94c318a7..a968fa4586 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +744,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1746,3 +1757,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 295e9d224e..2041de5bca 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -71,4 +71,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 0d0ba7c66a..f2fddeceb8 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -187,4 +187,7 @@ DATA(insert OID = 4082 ( 3580 pg_lsn_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9e68e65cc6..94c2e8d011 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0006-Implement-get_partitions_from_clauses-v12.patchtext/plain; charset=UTF-8; name=0006-Implement-get_partitions_from_clauses-v12.patchDownload
From 923fcbdf2080e719194e821295e546a819cac09b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 6/8] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1206 ++++++++++++++++++++++++++++++-
src/backend/optimizer/util/clauses.c | 4 +-
src/include/optimizer/clauses.h | 2 +
src/test/regress/expected/partition.out | 15 +-
4 files changed, 1210 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1fce54e432..04c4f034a9 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,8 +40,11 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
+#include "parser/parse_coerce.h"
#include "rewrite/rewriteManip.h"
#include "storage/lmgr.h"
#include "utils/array.h"
@@ -130,6 +135,69 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ *
+ * Equal keys are not required to be in any particular order, unlike the
+ * keys below which must appear in the same order as partition keys.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Does the query specify a key to be null or not null? Partitioning
+ * handles null partition keys specially depending on the partitioning
+ * method in use, we store this information.
+ */
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -177,6 +245,25 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static int32 partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1529,7 +1616,7 @@ get_partition_qual_relid(Oid relid)
}
/*
- * get_partitions_using_clauses
+ * get_partitions_from_clauses
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
*
@@ -1540,17 +1627,1128 @@ Bitmapset *
get_partitions_from_clauses(Relation relation, int rt_index,
List *partclauses)
{
- PartitionDesc partdesc = RelationGetPartitionDesc(relation);
- Bitmapset *result = NULL;
+ Bitmapset *result;
+ List *partconstr = RelationGetPartitionQual(relation);
Assert(partclauses != NIL);
- result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ /*
+ * If relation is a partition itself, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partconstr)
+ {
+ PartitionBoundInfo boundinfo =
+ RelationGetPartitionDesc(relation)->boundinfo;
+
+ /*
+ * We need to worry about such a case only if the relation has a
+ * default partition to begin with.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ }
+ }
+
+ result = get_partitions_from_clauses_recurse(relation, rt_index,
+ partclauses);
+
return result;
}
/* Module-local functions */
/*
+ * get_partitions_from_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ /*
+ * Reduce the set of clauses into a form that get_partitions_for_keys()
+ * can work with.
+ */
+ nkeys = classify_partition_bounding_keys(relation, clauses, rt_index,
+ &keys, &constfalse,
+ &or_clauses);
+
+ /*
+ * The analysis of the matched clauses done by
+ * classify_partition_bounding_keys may have found mutually contradictory
+ * clauses.
+ */
+ if (!constfalse)
+ {
+ /*
+ * If all clauses in the list were OR clauses,
+ * classify_partition_bounding_keys() wouldn't have formed keys
+ * yet. They will be handled below by recursively calling this
+ * function for each of OR clauses' arguments and combining the
+ * resulting partition sets appropriately.
+ */
+ if (nkeys > 0)
+ result = get_partitions_for_keys(relation, &keys);
+ else
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ }
+ else
+ return NULL;
+
+ /* No point in trying to look at other conjunctive clauses. */
+ if (bms_is_empty(result))
+ return NULL;
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ Bitmapset *or_partset = NULL;
+
+ foreach(lc1, or->args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc1));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation
+ * due to the latter's partition constraint, which means we must
+ * not add its partitions to or_partset. But the clause may not
+ * contain this relation's partition key expressions (instead the
+ * parent's), so we could not depend on just calling
+ * get_partitions_from_clauses_recurse(relation, ...) to determine
+ * that the clause indeed prunes all of the relation's partition.
+ *
+ * Use predicate refutation proof instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation,
+ rt_index,
+ arg_clauses);
+
+ /*
+ * Partition sets obtained from mutually-disjunctive clauses are
+ * combined using set union.
+ */
+ or_partset = bms_union(or_partset, arg_partset);
+ }
+
+ /*
+ * Partition sets obtained from mutually-conjunctive clauses are
+ * combined using set intersection.
+ */
+ result = bms_intersect(result, or_partset);
+ }
+
+ return result;
+}
+
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) ||\
+ equal((expr), (partexpr)))
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, and max keys, along with
+ * any Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max
+ * bounds. For example, of a > 1, a > 2, and a >= 5, "5" is the best min
+ * bound for the column a, which also happens to be an inclusive bound.
+ * When analyzing multiple clauses referencing the same key, it is checked
+ * if there are mutually contradictory clauses and if so, we set *constfalse
+ * to true to indicate to the caller that the set of clauses cannot be true
+ * for any partition. It is also set if the list already contains a
+ * pseudo-constant clause.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by clauses containing equality operator, unless hash
+ * partitioning is in use, in which case, it's possible that some keys have
+ * IS NULL clauses while remaining have clauses with equality operator.
+ * Min and max bounds could contain bound values for only a prefix of keys.
+ *
+ * All the OR clauses encountered in the list and those generated from certain
+ * ScalarArrayOpExprs are added to *or_clauses. It's the responsibility of the
+ * caller to process the argument clauses of each of the OR clauses, which
+ * would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_or_clauses = true;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, sizeof(keyclauses_all));
+ /* false means we don't know if a given key is null */
+ memset(keyisnull, false, sizeof(keyisnull));
+ /* false means we don't know if a given key is not null */
+ memset(keyisnotnull, false, sizeof(keyisnull));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i],
+ partcoll = partkey->partcollation[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ /* Set partexpr if needed. */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+ partexpr = copyObject(lfirst(partexprs_item));
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ constexpr = leftop;
+ else
+ /* Clause not meant for this column. */
+ continue;
+
+ /*
+ * Handle some cases wherein the clause's operator may not
+ * belong to the partitioning operator family. For example,
+ * operators named '<>' are not listed in any operator
+ * family whatsoever. Also, ordering opertors like '<' are
+ * not listed in the hash operator family.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Expr *ltexpr,
+ *gtexpr;
+ Oid negator,
+ ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is '<>', check if its
+ * negator is an equality operator. If so and it's a btree
+ * equality operator, we can use a special trick to prune
+ * partitions that won't satisfy the original '<>'
+ * operator -- we generate an OR expression
+ * 'leftop < rightop OR leftop > rightop' and add it to
+ * *or_clauses.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ {
+ Expr *or;
+
+ ltop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop,
+ (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop,
+ (Expr *) rightop,
+ InvalidOid, partcoll);
+ or = makeBoolExpr(OR_EXPR,
+ list_make2(ltexpr, gtexpr), -1);
+ *or_clauses = lappend(*or_clauses, or);
+ continue;
+ }
+ }
+
+ /*
+ * Getting here means opclause uses an ordering op and
+ * hash partitioning is in use. We shouldn't try to
+ * reason about such an operator for the purposes of
+ * partition pruning, because hash partitioning doesn't
+ * make partitioning decisions based on relative ordering
+ * of keys.
+ */
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * Flip the left and right args if we have to, because the
+ * code which extract the constant value to use for
+ * partition-pruning expects to find it as the rightop of the
+ * clause. (See below in this function.)
+ */
+ if (constexpr == rightop)
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = get_commutator(opclause->opno);
+ commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ only_or_clauses = false;
+
+ /*
+ * Since we only allow strict operators, require keys to be
+ * not null.
+ */
+ keyisnotnull[i] = true;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) &&
+ ((Var *) arg)->varattno == partattno) ||
+ equal(arg, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ keyisnull[i] = true;
+ else
+ keyisnotnull[i] = true;
+ n_keynullness++;
+ only_or_clauses = false;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ only_or_clauses = false;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_or_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Try to eliminate redundant keys. In the process, we might find out
+ * that clauses are mutually contradictory and hence can never be true
+ * for any rows.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i], &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ int32 op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ if (op_strategy < 0 &&
+ need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ else if (op_strategy == 0)
+ {
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ }
+ else if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found the same for partition key columns.
+ * If present, we don't need minkeys and maxkeys anymore. In the case
+ * of hash partitioning, we don't require all equal keys to be operator
+ * clauses. For hash partitioning, any IS NULL clauses are considered
+ * as equal keys by the code performing actual pruning, at which time it
+ * is checked whether, along with any operator clauses, all partition key
+ * columns are covered.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ keys->keyisnull[i] = keyisnull[i];
+ keys->keyisnotnull[i] = keyisnotnull[i];
+ }
+
+ return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
+}
+
+/*
+ * Returns -1, 0, or 1 to signify that the partitioning clause has a </<=,
+ * =, and >/>= operator, respectively. Sets *incl to true if equality is
+ * implied.
+ */
+static int32
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ int32 result;
+
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = 0;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ result = -1;
+ *incl = (op->op_strategy == BTLessEqualStrategyNumber);
+ break;
+ case BTEqualStrategyNumber:
+ result = 0;
+ *incl = true;
+ break;
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ result = 1;
+ *incl = (op->op_strategy == BTGreaterEqualStrategyNumber);
+ break;
+ }
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partattoff])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partattoff], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ /*
+ * If couldn't coerce to the partition key type, that is, the type of
+ * datums stored in PartitionBoundInfo, no hope of using this
+ * expression for anything partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+/*
+ * For a given partition key column, find the most restrictive of the clauses
+ * contained in all_clauses that are known to match the column. If in the
+ * process, it is found that two clauses are mutually contradictory, we simply
+ * stop, set *constfalse to true, and return.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey, int partattoff,
+ List *all_clauses, List **result,
+ bool *constfalse)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ hash_clause = NULL;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[partattoff],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've matched
+ * a clause and found another whose constant operand doesn't match
+ * the constant operand of the former, we have a case of mutually
+ * contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, partattoff,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value and
+ * so add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with the same. It's possible that mutual
+ * contradiction is proved at some higher level, but it's just
+ * that we couldn't do so here.
+ */
+ else
+ *result = lappend(*result, cur);
+
+ /* The code below is for btree operators, which cur is not. */
+ continue;
+ }
+
+ /*
+ * Stuff that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points to the currently best scan key of strategy
+ * type s+1; it is NULL if we haven't yet found such a key for this
+ * attr.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, replace old key. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+
+ /* The old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ *result = lappend(*result, hash_clause);
+ return;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equal key with keys of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq key is
+ * a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq key is a = 3, then because 3 < 5, we no longer need a < 5,
+ * because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the result.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ if (btree_clauses[s])
+ *result = lappend(*result, btree_clauses[s]);
+}
+
+/*
+ * Evaluate 'leftarg op rightarg' and set *result to its value.
+ *
+ * leftarg and rightarg referred to above actually refer to the constant
+ * operand (Datum) of the clause contained in the parameters leftarg and
+ * rightarg below, respectively. And op refers to the operator of the
+ * clause contained in the parameter op below.
+ *
+ * Returns true if we could actually perform the evaluation. False is
+ * returned otherwise, that is, in cases where we couldn't perform the
+ * evaluation for reasons such as operands values being unavailable or
+ * types of operands being incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partattoff];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partattoff,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partattoff,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg_const and rightarg_consr are both of the type expected
+ * by op's operator, then compare them using the latter.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ Bitmapset *result = NULL;
+
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index d14ef31eae..72f1fa30a6 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -149,8 +149,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4744,7 +4742,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index e3672218f3..1ef13a49de 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 963561dbfe..d44ff4f608 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -1049,16 +1049,11 @@ explain (costs off) select * from boolpart where a is not true;
(7 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
--
2.11.0
0007-Some-interface-changes-for-partition_bound_-cmp-bsea-v12.patchtext/plain; charset=UTF-8; name=0007-Some-interface-changes-for-partition_bound_-cmp-bsea-v12.patchDownload
From 191b45a2a3e9b92fe4e8cf623d02280ad8bd66a5 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 7/8] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 164 ++++++++++++++++++++++++++++++----------
1 file changed, 122 insertions(+), 42 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 04c4f034a9..a75816d3ca 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -198,6 +198,31 @@ typedef struct PartScanKeyInfo
bool keyisnotnull[PARTITION_MAX_KEYS];
} PartScanKeyInfo;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -230,14 +255,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -1066,6 +1092,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -1080,8 +1108,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1154,10 +1188,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1208,6 +1248,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1229,8 +1270,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1244,9 +1288,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3746,12 +3790,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -3780,11 +3827,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -3992,12 +4043,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -4019,11 +4070,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -4032,25 +4083,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -4061,12 +4142,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -4080,20 +4162,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -4106,8 +4187,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0008-Implement-get_partitions_for_keys-v12.patchtext/plain; charset=UTF-8; name=0008-Implement-get_partitions_for_keys-v12.patchDownload
From 6bb219a92945f32f0f00b21ae8575c1f063cdc06 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 8/8] Implement get_partitions_for_keys
Disable constraint_exclusion pruning using internal partition
constraints for select queries, because we've now got the new pruning
working.
---
src/backend/catalog/partition.c | 432 +++++++++++++++++++++++++++++++-
src/backend/optimizer/util/plancat.c | 29 ++-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition.out | 345 +++++++++++++++++++++----
src/test/regress/sql/partition.sql | 47 +++-
5 files changed, 789 insertions(+), 72 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index a75816d3ca..cbb9886ad4 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2784,10 +2784,438 @@ partition_cmp_args(PartitionKey key, int partattoff,
static Bitmapset *
get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
{
- PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
Bitmapset *result = NULL;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool hash_isnull[PARTITION_MAX_KEYS];
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return NULL;
+
+ memset(hash_isnull, false, sizeof(hash_isnull));
+ /* Handle null partition keys. */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keyisnull[i])
+ {
+ int other_idx = -1;
+
+ switch (partkey->strategy)
+ {
+ /*
+ * Hash partitioning handles puts nulls into a normal
+ * partition and doesn't require to define a special
+ * null-accpting partition. So, we let this fall through
+ * get handled by the code below that handles equality
+ * keys.
+ */
+ case PARTITION_STRATEGY_HASH:
+ hash_isnull[i] = true;
+ keys->n_eqkeys++;
+ break;
+
+ /*
+ * In range and list partitioning cases, only a designated
+ * partition will accept nulls.
+ */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ result = bms_make_singleton(other_idx);
+ return result;
+ }
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(boundinfo->default_index);
+ return result;
+ }
+
+
+ /*
+ * Determine set of partitions using provided keys, which proceeds in a
+ * manner determined by the partitioning method.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ Assert(keys->n_eqkeys == partkey->partnatts);
+ switch (partkey->strategy)
+ {
+ /* Hash-partitioning is real simple. */
+ case PARTITION_STRATEGY_HASH:
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys,
+ hash_isnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ /* There is no such thing as "partition not defined" here! */
+ Assert(result_index >= 0);
+ result = bms_make_singleton(result_index);
+
+ return result;
+ }
+
+ /* Range and list partitioning take a bit more work. */
+
+ case PARTITION_STRATEGY_LIST:
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+ /* For list partition, must exactly match the datum. */
+ if (eqoff >= 0 && !is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (eqoff >= 0)
+ eqoff += 1;
+ break;
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ result = bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(boundinfo->default_index);
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return result;
+ }
+
+ /*
+ * Hash partitioning doesn't understand non-equality conditions, so
+ * return all partitions.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ return result;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case,
+ * set minoff to the index of the leftmost datum, viz. 0.
+ *
+ * If the bound at minoff doesn't exactly match minkey or if
+ * it does but minkey isn't inclusive, move to the bound on
+ * the right.
+ */
+ if (minoff == -1 || !is_equal || !keys->min_incl)
+ minoff++;
+
+ /*
+ * boundinfo->ndatums - 1 is the last valid list partition datums
+ * index.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ minoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ /* 1 more index than datums in this case */
+ maxoff = boundinfo->ndatums;
+ else
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Unlike minoff, we leave maxoff that is set to -1 unchanged,
+ * because it simply means none of the partitions satisfies
+ * maxkeys.
+ *
+ * If the bound at maxoff exactly matches maxkey (is_equal),
+ * but the maxkey is not inclusive, then go to the bound on
+ * left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
- result = bms_add_range(result, 0, partdesc->nparts - 1);
+ /*
+ * maxoff may have become -1, which again means no partition
+ * satisfies the maxkeys.
+ */
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * All datums between those at minoff and maxoff satisfy the
+ * query keys, so add the corresponding partitions to the
+ * result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list
+ * partition. Because list partitions divide the key space
+ * in a discontinuous manner, not all values in the given
+ * range will have a partition assigned.
+ */
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys will nulls are mapped to default
+ * range partition, we must include the default partition
+ * if certain keys could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!keys->keyisnotnull[i])
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ }
+
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
return result;
}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index e1ef936e68..6826f5fc9d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1248,21 +1247,25 @@ get_relation_constraints(PlannerInfo *root,
}
/* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
+
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index fac7b62f9c..5a74151c8f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1904,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index d44ff4f608..400e97eb94 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -120,6 +120,8 @@ explain (costs off) select * from lp where a <> 'a' and a <> 'd';
QUERY PLAN
-------------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_ef
@@ -128,12 +130,14 @@ explain (costs off) select * from lp where a <> 'a' and a <> 'd';
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-(9 rows)
+(11 rows)
explain (costs off) select * from lp where a not in ('a', 'd');
QUERY PLAN
------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_bc
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_ef
@@ -142,7 +146,7 @@ explain (costs off) select * from lp where a not in ('a', 'd');
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_default
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-(9 rows)
+(11 rows)
-- collation matches the partitioning collation, pruning works
create table coll_pruning (a text collate "C") partition by list (a);
@@ -208,16 +212,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +521,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +575,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +649,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +657,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -692,7 +688,9 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
Append
-> Seq Scan on mc3p0
Filter: ((a = 1) AND (abs(b) < 1))
-(3 rows)
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) < 1))
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
QUERY PLAN
@@ -714,9 +712,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -813,12 +809,14 @@ explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
(3 rows)
explain (costs off) select * from mc3p where a > 20;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+--------------------------------
Append
-> Seq Scan on mc3p7
Filter: (a > 20)
-(3 rows)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 20)
+(5 rows)
explain (costs off) select * from mc3p where a >= 20;
QUERY PLAN
@@ -844,7 +842,9 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
-(7 rows)
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(9 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
QUERY PLAN
@@ -858,7 +858,9 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
-(9 rows)
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
QUERY PLAN
@@ -886,6 +888,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -896,7 +900,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -957,9 +961,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1001,28 +1007,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1032,21 +1030,15 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
QUERY PLAN
@@ -1079,4 +1071,253 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- some more cases
+-- pruning for partitioned table appearing inside a sub-query
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp;
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
index 9dfcbe1e70..2a623afd2f 100644
--- a/src/test/regress/sql/partition.sql
+++ b/src/test/regress/sql/partition.sql
@@ -152,4 +152,49 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- some more cases
+
+-- pruning for partitioned table appearing inside a sub-query
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp;
--
2.11.0
On 17 November 2017 at 23:01, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:
Please find attached updated patch set. There are significant changes in
this version as described below, including the support for hash
partitioned tables.
Hi Amit,
Thanks for making those changes and adding the HASH partition support.
There's a good chance that I'm not going to get time to look at this maybe
until the last day of the month. I hope someone else can look over it in
the meantime.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
On 2017/11/17 21:44, David Rowley wrote:
On 17 November 2017 at 23:01, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:Please find attached updated patch set. There are significant changes in
this version as described below, including the support for hash
partitioned tables.Hi Amit,
Thanks for making those changes and adding the HASH partition support.
There's a good chance that I'm not going to get time to look at this maybe
until the last day of the month. I hope someone else can look over it in
the meantime.
No problem, thanks for the review so far and look forward to the next time
you'll be able to comment.
Regards,
Amit
Thank you and sorry for the confused comments.
At Mon, 13 Nov 2017 18:46:28 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <8460a6c3-68c5-b78a-7d18-d253180f2188@lab.ntt.co.jp>
Horiguchi-san,
Thanks for taking a look. Replying to all your emails here.
In 0003,
+match_clauses_to_partkey(RelOptInfo *rel, ... + if (rinfo->pseudoconstant && + (IsA(clause, Const) && + ((((Const *) clause)->constisnull) || + !DatumGetBool(((Const *) clause)->constvalue)))) + { + *constfalse = true; + continue;If we call this function in both conjunction and disjunction
context (the latter is only in recursive case). constfalse ==
true means no need of any clauses for the former case.Since (I think) just a list of RestrictInfo is expected to be
treated as a conjunction (it's quite doubious, though..),I think it makes sense to consider a list of RestrictInfo's, such as
baserestrictinfo, that is passed as input to match_clauses_to_partkey(),
to be mutually conjunctive for our purpose here.
You're right and I know it. I'm ok to leave it since I recalled
that clause_selectivity always has a similar code.
On 2017/11/10 14:44, Kyotaro HORIGUCHI wrote:
At Fri, 10 Nov 2017 14:38:11 +0900, Kyotaro HORIGUCHI wrote:
This is working fine. Sorry for the bogus comment.
I'd almost started looking around if something might be wrong after all. :)
Very sorry for the wrong comment:(
On 2017/11/10 16:07, Kyotaro HORIGUCHI wrote:
At Fri, 10 Nov 2017 14:44:55 +0900, Kyotaro HORIGUCHI wrote:
Those two conditions are not orthogonal. Maybe something like
following seems more understantable.if (!constfalse)
{
/* No constraints on the keys, so, return *all* partitions. */
if (nkeys == 0)
return bms_add_range(result, 0, partdesc->nparts - 1);result = get_partitions_for_keys(relation, &keys);
}So, the condition (!constfalse && nkeys == 0) cannot return
there. I'm badly confused by the variable name.Do you mean by 'constfalse'?
Perhaps. The name "constfalse" is suggesting (for me) that the
cluses evaluate to false constantly. But acutally it means just
the not-in-the-return clauses are results in false. Anyway I'll
take a look on v12 and will comment at the time.
I couldn't find another reasonable structure using the current
classify_p_b_keys(), but could you add a comment like the
following as an example?OK, will add comments explaining what's going on.
Will post the updated patches after also taking care of David's and Amul's
review comments upthread.Thanks,
Amit
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Fri, Nov 17, 2017 at 3:31 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp
wrote:
Support for hash partitioning and tests for the same. Also, since
update/delete on partitioned tables still depend on constraint exclusion
for pruning, fix things such that get_relation_constraints includes
partition constraints in its result only for non-select queries (for
selects we have the new pruning code). Other bug fixes.Hi Amit,
I have applied attached patch set on commit
11e264517dff7a911d9e6494de86049cab42cde3 and try to test for hash
partition. I got a server crash with below test.
CREATE TABLE hp_tbl (a int, b int, c int) PARTITION BY HASH (a);
CREATE TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 4,
remainder 0) PARTITION BY HASH (b);
CREATE TABLE hp_tbl_p1_p1 PARTITION OF hp_tbl_p1 FOR VALUES WITH (modulus
4, remainder 0) PARTITION BY HASH (c);
CREATE TABLE hp_tbl_p1_p1_p1 PARTITION OF hp_tbl_p1_p1 FOR VALUES WITH
(modulus 4, remainder 0);
CREATE TABLE hp_tbl_p2 PARTITION OF hp_tbl FOR VALUES WITH (modulus 4,
remainder 1) PARTITION BY HASH (b);
CREATE TABLE hp_tbl_p2_p1 PARTITION OF hp_tbl_p2 FOR VALUES WITH (modulus
4, remainder 1);
CREATE TABLE hp_tbl_p2_p2 PARTITION OF hp_tbl_p2 FOR VALUES WITH (modulus
4, remainder 2);
insert into hp_tbl select i,i,i from generate_series(0,10)i where i not
in(2,4,6,7,10);
explain select * from hp_tbl where a = 2;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
Thanks Rajkumar for the test.
On 2017/11/21 19:06, Rajkumar Raghuwanshi wrote:
CREATE TABLE hp_tbl (a int, b int, c int) PARTITION BY HASH (a);
CREATE TABLE hp_tbl_p1 PARTITION OF hp_tbl FOR VALUES WITH (modulus 4,
remainder 0) PARTITION BY HASH (b);
CREATE TABLE hp_tbl_p1_p1 PARTITION OF hp_tbl_p1 FOR VALUES WITH (modulus
4, remainder 0) PARTITION BY HASH (c);
CREATE TABLE hp_tbl_p1_p1_p1 PARTITION OF hp_tbl_p1_p1 FOR VALUES WITH
(modulus 4, remainder 0);
CREATE TABLE hp_tbl_p2 PARTITION OF hp_tbl FOR VALUES WITH (modulus 4,
remainder 1) PARTITION BY HASH (b);
CREATE TABLE hp_tbl_p2_p1 PARTITION OF hp_tbl_p2 FOR VALUES WITH (modulus
4, remainder 1);
CREATE TABLE hp_tbl_p2_p2 PARTITION OF hp_tbl_p2 FOR VALUES WITH (modulus
4, remainder 2);
insert into hp_tbl select i,i,i from generate_series(0,10)i where i not
in(2,4,6,7,10);explain select * from hp_tbl where a = 2;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
It seems I wrote an Assert in the code to support hash partitioning that
wasn't based on a valid assumption. I was wrongly assuming that all hash
partitions for a given modulus (largest modulus) must exist at any given
time, but that isn't the case.
Fixed in the attached. No other changes beside that.
Thanks,
Amit
Attachments:
0001-Add-default-partition-case-in-inheritance-testing.patchtext/plain; charset=UTF-8; name=0001-Add-default-partition-case-in-inheritance-testing.patchDownload
From 6e7e1cd67404eabbeaf67ee4fd72b6b02bfa23c9 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 17 Nov 2017 14:00:42 +0900
Subject: [PATCH 1/8] Add default partition case in inheritance testing
---
src/test/regress/expected/inherit.out | 29 +++++++++++++++++++----------
src/test/regress/sql/inherit.sql | 9 +++++----
2 files changed, 24 insertions(+), 14 deletions(-)
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..a202caeb25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1853,13 +1853,14 @@ drop table range_list_parted;
-- check that constraint exclusion is able to cope with the partition
-- constraint emitted for multi-column range partitioned tables
create table mcrparted (a int, b int, c int) partition by range (a, abs(b), c);
+create table mcrparted_def partition of mcrparted default;
create table mcrparted0 partition of mcrparted for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
create table mcrparted1 partition of mcrparted for values from (1, 1, 1) to (10, 5, 10);
create table mcrparted2 partition of mcrparted for values from (10, 5, 10) to (10, 10, 10);
create table mcrparted3 partition of mcrparted for values from (11, 1, 1) to (20, 10, 10);
create table mcrparted4 partition of mcrparted for values from (20, 10, 10) to (20, 20, 20);
create table mcrparted5 partition of mcrparted for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
-explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0
+explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0, mcrparted_def
QUERY PLAN
------------------------------
Append
@@ -1867,7 +1868,7 @@ explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0
Filter: (a = 0)
(3 rows)
-explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1
+explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1, mcrparted_def
QUERY PLAN
---------------------------------------------
Append
@@ -1875,7 +1876,7 @@ explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scan
Filter: ((a = 10) AND (abs(b) < 5))
(3 rows)
-explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2
+explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2, mcrparted_def
QUERY PLAN
---------------------------------------------
Append
@@ -1883,11 +1884,13 @@ explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scan
Filter: ((a = 10) AND (abs(b) = 5))
-> Seq Scan on mcrparted2
Filter: ((a = 10) AND (abs(b) = 5))
-(5 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: ((a = 10) AND (abs(b) = 5))
+(7 rows)
explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all partitions
- QUERY PLAN
-------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on mcrparted0
Filter: (abs(b) = 5)
@@ -1899,7 +1902,9 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-(11 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: (abs(b) = 5)
+(13 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
@@ -1917,7 +1922,9 @@ explain (costs off) select * from mcrparted where a > -1; -- scans all partition
Filter: (a > '-1'::integer)
-> Seq Scan on mcrparted5
Filter: (a > '-1'::integer)
-(13 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: (a > '-1'::integer)
+(15 rows)
explain (costs off) select * from mcrparted where a = 20 and abs(b) = 10 and c > 10; -- scans mcrparted4
QUERY PLAN
@@ -1927,7 +1934,7 @@ explain (costs off) select * from mcrparted where a = 20 and abs(b) = 10 and c >
Filter: ((c > 10) AND (a = 20) AND (abs(b) = 10))
(3 rows)
-explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mcrparted3, mcrparte4, mcrparte5
+explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mcrparted3, mcrparte4, mcrparte5, mcrparted_def
QUERY PLAN
-----------------------------------------
Append
@@ -1937,7 +1944,9 @@ explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mc
Filter: ((c > 20) AND (a = 20))
-> Seq Scan on mcrparted5
Filter: ((c > 20) AND (a = 20))
-(7 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: ((c > 20) AND (a = 20))
+(9 rows)
drop table mcrparted;
-- check that partitioned table Appends cope with being referenced in
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
index 169d0dc0f5..c71febffc2 100644
--- a/src/test/regress/sql/inherit.sql
+++ b/src/test/regress/sql/inherit.sql
@@ -664,19 +664,20 @@ drop table range_list_parted;
-- check that constraint exclusion is able to cope with the partition
-- constraint emitted for multi-column range partitioned tables
create table mcrparted (a int, b int, c int) partition by range (a, abs(b), c);
+create table mcrparted_def partition of mcrparted default;
create table mcrparted0 partition of mcrparted for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
create table mcrparted1 partition of mcrparted for values from (1, 1, 1) to (10, 5, 10);
create table mcrparted2 partition of mcrparted for values from (10, 5, 10) to (10, 10, 10);
create table mcrparted3 partition of mcrparted for values from (11, 1, 1) to (20, 10, 10);
create table mcrparted4 partition of mcrparted for values from (20, 10, 10) to (20, 20, 20);
create table mcrparted5 partition of mcrparted for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
-explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0
-explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1
-explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2
+explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0, mcrparted_def
+explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1, mcrparted_def
+explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2, mcrparted_def
explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all partitions
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
explain (costs off) select * from mcrparted where a = 20 and abs(b) = 10 and c > 10; -- scans mcrparted4
-explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mcrparted3, mcrparte4, mcrparte5
+explain (costs off) select * from mcrparted where a = 20 and c > 20; -- scans mcrparted3, mcrparte4, mcrparte5, mcrparted_def
drop table mcrparted;
-- check that partitioned table Appends cope with being referenced in
--
2.11.0
0002-Tweak-default-range-partition-s-constraint-a-little.patchtext/plain; charset=UTF-8; name=0002-Tweak-default-range-partition-s-constraint-a-little.patchDownload
From 2cd9696a399fcedcda3240b15ef14e9598f7b5ed Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 31 Oct 2017 16:26:55 +0900
Subject: [PATCH 2/8] Tweak default range partition's constraint a little
When using as a predicate, it's useful for it explicitly say that
the default range partition might contain nulls, because non-default
range partitions can't.
---
src/backend/catalog/partition.c | 29 +++++++++++++++++++++++------
src/test/regress/expected/inherit.out | 12 ++++++++----
src/test/regress/expected/update.out | 2 +-
3 files changed, 32 insertions(+), 11 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 9a44cceb22..e032c11ed4 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2134,12 +2134,29 @@ get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
if (or_expr_args != NIL)
{
- /* OR all the non-default partition constraints; then negate it */
- result = lappend(result,
- list_length(or_expr_args) > 1
- ? makeBoolExpr(OR_EXPR, or_expr_args, -1)
- : linitial(or_expr_args));
- result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
+ Expr *other_parts_constr;
+
+ /*
+ * Combine the constraints obtained for non-default partitions
+ * using OR. As requested, each of the OR's args doesn't include
+ * the NOT NULL test for partition keys (which is to avoid its
+ * useless repetition). Add the same now.
+ */
+ other_parts_constr =
+ makeBoolExpr(AND_EXPR,
+ lappend(get_range_nulltest(key),
+ list_length(or_expr_args) > 1
+ ? makeBoolExpr(OR_EXPR, or_expr_args,
+ -1)
+ : linitial(or_expr_args)),
+ -1);
+
+ /*
+ * Finally, the default partition contains everything *NOT*
+ * contained in the non-default partitions.
+ */
+ result = list_make1(makeBoolExpr(NOT_EXPR,
+ list_make1(other_parts_constr), -1));
}
return result;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a202caeb25..fac7b62f9c 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1861,12 +1861,14 @@ create table mcrparted3 partition of mcrparted for values from (11, 1, 1) to (20
create table mcrparted4 partition of mcrparted for values from (20, 10, 10) to (20, 20, 20);
create table mcrparted5 partition of mcrparted for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
explain (costs off) select * from mcrparted where a = 0; -- scans mcrparted0, mcrparted_def
- QUERY PLAN
-------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on mcrparted0
Filter: (a = 0)
-(3 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: (a = 0)
+(5 rows)
explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scans mcrparted1, mcrparted_def
QUERY PLAN
@@ -1874,7 +1876,9 @@ explain (costs off) select * from mcrparted where a = 10 and abs(b) < 5; -- scan
Append
-> Seq Scan on mcrparted1
Filter: ((a = 10) AND (abs(b) < 5))
-(3 rows)
+ -> Seq Scan on mcrparted_def
+ Filter: ((a = 10) AND (abs(b) < 5))
+(5 rows)
explain (costs off) select * from mcrparted where a = 10 and abs(b) = 5; -- scans mcrparted1, mcrparted2, mcrparted_def
QUERY PLAN
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index a4fe96112e..b69ceaa75e 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -227,7 +227,7 @@ create table part_def partition of range_parted default;
a | text | | | | extended | |
b | integer | | | | plain | |
Partition of: range_parted DEFAULT
-Partition constraint: (NOT (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20))))
+Partition constraint: (NOT ((a IS NOT NULL) AND (b IS NOT NULL) AND (((a = 'a'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'a'::text) AND (b >= 10) AND (b < 20)) OR ((a = 'b'::text) AND (b >= 1) AND (b < 10)) OR ((a = 'b'::text) AND (b >= 10) AND (b < 20)))))
insert into range_parted values ('c', 9);
-- ok
--
2.11.0
0003-Add-new-tests-for-partition-pruning-v13.patchtext/plain; charset=UTF-8; name=0003-Add-new-tests-for-partition-pruning-v13.patchDownload
From b6d6d22e886119f06f3d77d7a651a4fbdcfd33fe Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 5 Sep 2017 10:28:23 +0900
Subject: [PATCH 3/8] Add new tests for partition-pruning
---
src/test/regress/expected/partition.out | 1087 +++++++++++++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition.sql | 155 +++++
4 files changed, 1244 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition.out
create mode 100644 src/test/regress/sql/partition.sql
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
new file mode 100644
index 0000000000..963561dbfe
--- /dev/null
+++ b/src/test/regress/expected/partition.out
@@ -0,0 +1,1087 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+ QUERY PLAN
+------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ -> Seq Scan on lp_bc
+ -> Seq Scan on lp_ef
+ -> Seq Scan on lp_g
+ -> Seq Scan on lp_null
+ -> Seq Scan on lp_default
+(7 rows)
+
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+ QUERY PLAN
+-----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a > 'a'::bpchar) AND (a <= 'd'::bpchar))
+(7 rows)
+
+explain (costs off) select * from lp where a = 'a';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a = 'a'::bpchar)
+(3 rows)
+
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ('a'::bpchar = a)
+(3 rows)
+
+explain (costs off) select * from lp where a is not null;
+ QUERY PLAN
+---------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_bc
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_ef
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_g
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on lp_default
+ Filter: (a IS NOT NULL)
+(11 rows)
+
+explain (costs off) select * from lp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on lp_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a = 'a'::bpchar) OR (a = 'c'::bpchar))
+(5 rows)
+
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND ((a = 'a'::bpchar) OR (a = 'c'::bpchar)))
+(5 rows)
+
+explain (costs off) select * from lp where a <> 'g';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'g'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'g'::bpchar)
+(9 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
+(9 rows)
+
+explain (costs off) select * from lp where a not in ('a', 'd');
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_ef
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_g
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+ -> Seq Scan on lp_default
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
+(9 rows)
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: (a = 'a'::text COLLATE "C")
+(3 rows)
+
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_a
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_b
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_def
+ Filter: ((a)::text = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+explain (costs off) select * from rlp where a < 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 1)
+(3 rows)
+
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (1 > a)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 1)
+ -> Seq Scan on rlp2
+ Filter: (a <= 1)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 1)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = 1)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (a = '1'::bigint)
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3abcd
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3efgh
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp3_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_2
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp4_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_1
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp5_default
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_10
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_30
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a)::numeric = '1'::numeric)
+(31 rows)
+
+explain (costs off) select * from rlp where a <= 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 10)
+ -> Seq Scan on rlp2
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 10)
+(9 rows)
+
+explain (costs off) select * from rlp where a > 10;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a > 10)
+ -> Seq Scan on rlp3efgh
+ Filter: (a > 10)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a > 10)
+ -> Seq Scan on rlp3_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_2
+ Filter: (a > 10)
+ -> Seq Scan on rlp4_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_1
+ Filter: (a > 10)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_30
+ Filter: (a > 10)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 10)
+(23 rows)
+
+explain (costs off) select * from rlp where a < 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a < 15)
+ -> Seq Scan on rlp2
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a < 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a < 15)
+(9 rows)
+
+explain (costs off) select * from rlp where a <= 15;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 15)
+ -> Seq Scan on rlp2
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 15)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 15)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 15)
+(17 rows)
+
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 15) AND ((b)::text = 'ab'::text))
+(17 rows)
+
+explain (costs off) select * from rlp where a = 16;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (a = 16)
+ -> Seq Scan on rlp3efgh
+ Filter: (a = 16)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a = 16)
+ -> Seq Scan on rlp3_default
+ Filter: (a = 16)
+(9 rows)
+
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: ((a = 16) AND ((b)::text = ANY ('{not,in,here}'::text[])))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+ QUERY PLAN
+---------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text < 'ab'::text) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: (((b)::text <= 'ab'::text) AND (a = 16))
+(5 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is null;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NULL) AND (a = 16))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 16 and b is not null;
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((b IS NOT NULL) AND (a = 16))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND (a = 16))
+(9 rows)
+
+explain (costs off) select * from rlp where a is null;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on rlp_default_null
+ Filter: (a IS NULL)
+(3 rows)
+
+explain (costs off) select * from rlp where a is not null;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3abcd
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3efgh
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp3_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_2
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp4_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_1
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp5_default
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_10
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_30
+ Filter: (a IS NOT NULL)
+ -> Seq Scan on rlp_default_default
+ Filter: (a IS NOT NULL)
+(29 rows)
+
+explain (costs off) select * from rlp where a > 30;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp5_1
+ Filter: (a > 30)
+ -> Seq Scan on rlp5_default
+ Filter: (a > 30)
+ -> Seq Scan on rlp_default_default
+ Filter: (a > 30)
+(7 rows)
+
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+ QUERY PLAN
+----------------------------------
+ Append
+ -> Seq Scan on rlp_default_30
+ Filter: (a = 30)
+(3 rows)
+
+explain (costs off) select * from rlp where a <= 31;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3abcd
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3efgh
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3nullxy
+ Filter: (a <= 31)
+ -> Seq Scan on rlp3_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_2
+ Filter: (a <= 31)
+ -> Seq Scan on rlp4_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_1
+ Filter: (a <= 31)
+ -> Seq Scan on rlp5_default
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_10
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_30
+ Filter: (a <= 31)
+ -> Seq Scan on rlp_default_default
+ Filter: (a <= 31)
+(29 rows)
+
+explain (costs off) select * from rlp where a = 1 or a = 7;
+ QUERY PLAN
+--------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR (a = 7))
+(3 rows)
+
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on rlp1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp3abcd
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_2
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp4_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_1
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp5_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_10
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_null
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a = 1) OR ((b)::text = 'ab'::text))
+(25 rows)
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 20) AND (a < 27))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 20) AND (a < 27))
+(7 rows)
+
+explain (costs off) select * from rlp where a = 29;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a = 29)
+(3 rows)
+
+explain (costs off) select * from rlp where a >= 29;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on rlp4_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_1
+ Filter: (a >= 29)
+ -> Seq Scan on rlp5_default
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_30
+ Filter: (a >= 29)
+ -> Seq Scan on rlp_default_default
+ Filter: (a >= 29)
+(11 rows)
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+ QUERY PLAN
+----------------------------------------
+ Append
+ -> Seq Scan on rlp_default_10
+ Filter: ((a > 1) AND (a = 10))
+(3 rows)
+
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rlp3abcd
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3efgh
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3nullxy
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_2
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp4_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_1
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp5_default
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_30
+ Filter: ((a > 1) AND (a >= 15))
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 1) AND (a >= 15))
+(23 rows)
+
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+ QUERY PLAN
+-------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp2
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3abcd
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3efgh
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3nullxy
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+ -> Seq Scan on rlp3_default
+ Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
+(11 rows)
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+explain (costs off) select * from mc3p where a = 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a = 1)
+ -> Seq Scan on mc3p1
+ Filter: (a = 1)
+ -> Seq Scan on mc3p_default
+ Filter: (a = 1)
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) < 1))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+ QUERY PLAN
+--------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
+(7 rows)
+
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 10) AND (abs(b) >= 5) AND (abs(b) <= 35))
+(11 rows)
+
+explain (costs off) select * from mc3p where a > 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a > 10)
+ -> Seq Scan on mc3p6
+ Filter: (a > 10)
+ -> Seq Scan on mc3p7
+ Filter: (a > 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 10)
+(9 rows)
+
+explain (costs off) select * from mc3p where a >= 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p2
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p3
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p4
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p5
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 10)
+(17 rows)
+
+explain (costs off) select * from mc3p where a < 10;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (a < 10)
+ -> Seq Scan on mc3p1
+ Filter: (a < 10)
+ -> Seq Scan on mc3p_default
+ Filter: (a < 10)
+(7 rows)
+
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p1
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p2
+ Filter: ((a <= 10) AND (abs(b) < 10))
+ -> Seq Scan on mc3p_default
+ Filter: ((a <= 10) AND (abs(b) < 10))
+(9 rows)
+
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+ QUERY PLAN
+---------------------------------------------
+ Append
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 11) AND (abs(b) = 0))
+(3 rows)
+
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p6
+ Filter: ((a = 20) AND (c = 100) AND (abs(b) = 10))
+(3 rows)
+
+explain (costs off) select * from mc3p where a > 20;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on mc3p7
+ Filter: (a > 20)
+(3 rows)
+
+explain (costs off) select * from mc3p where a >= 20;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc3p5
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p6
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p7
+ Filter: (a >= 20)
+ -> Seq Scan on mc3p_default
+ Filter: (a >= 20)
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(7 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(9 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p5
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1) OR (a = 1))
+(11 rows)
+
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+ QUERY PLAN
+------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p1
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p2
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p4
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p5
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p6
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p7
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+(17 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p3
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p4
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 10)))
+(13 rows)
+
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ Append
+ -> Seq Scan on mc3p0
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p1
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p2
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1)) OR ((a = 10) AND (abs(b) = 9)))
+(9 rows)
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (2, minvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+explain (costs off) select * from mc2p where a < 2;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc2p0
+ Filter: (a < 2)
+ -> Seq Scan on mc2p1
+ Filter: (a < 2)
+ -> Seq Scan on mc2p2
+ Filter: (a < 2)
+ -> Seq Scan on mc2p_default
+ Filter: (a < 2)
+(9 rows)
+
+explain (costs off) select * from mc2p where a = 2 and b < 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on mc2p3
+ Filter: ((b < 1) AND (a = 2))
+(3 rows)
+
+explain (costs off) select * from mc2p where a > 1;
+ QUERY PLAN
+--------------------------------
+ Append
+ -> Seq Scan on mc2p2
+ Filter: (a > 1)
+ -> Seq Scan on mc2p3
+ Filter: (a > 1)
+ -> Seq Scan on mc2p4
+ Filter: (a > 1)
+ -> Seq Scan on mc2p5
+ Filter: (a > 1)
+ -> Seq Scan on mc2p_default
+ Filter: (a > 1)
+(11 rows)
+
+explain (costs off) select * from mc2p where a = 1 and b > 1;
+ QUERY PLAN
+---------------------------------------
+ Append
+ -> Seq Scan on mc2p2
+ Filter: ((b > 1) AND (a = 1))
+(3 rows)
+
+-- boolean partitioning
+create table boolpart (a bool) partition by list (a);
+create table boolpart_default partition of boolpart default;
+create table boolpart_t partition of boolpart for values in ('true');
+create table boolpart_f partition of boolpart for values in ('false');
+explain (costs off) select * from boolpart where a in (true, false);
+ QUERY PLAN
+------------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a = ANY ('{t,f}'::boolean[]))
+ -> Seq Scan on boolpart_t
+ Filter: (a = ANY ('{t,f}'::boolean[]))
+(5 rows)
+
+explain (costs off) select * from boolpart where a = false;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (NOT a)
+ -> Seq Scan on boolpart_t
+ Filter: (NOT a)
+ -> Seq Scan on boolpart_default
+ Filter: (NOT a)
+(7 rows)
+
+explain (costs off) select * from boolpart where not a = false;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: a
+ -> Seq Scan on boolpart_t
+ Filter: a
+ -> Seq Scan on boolpart_default
+ Filter: a
+(7 rows)
+
+explain (costs off) select * from boolpart where a is true or a is not true;
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+ -> Seq Scan on boolpart_t
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+ -> Seq Scan on boolpart_default
+ Filter: ((a IS TRUE) OR (a IS NOT TRUE))
+(7 rows)
+
+explain (costs off) select * from boolpart where a is not true;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a IS NOT TRUE)
+ -> Seq Scan on boolpart_t
+ Filter: (a IS NOT TRUE)
+ -> Seq Scan on boolpart_default
+ Filter: (a IS NOT TRUE)
+(7 rows)
+
+explain (costs off) select * from boolpart where a is not true and a is not false;
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
+ -> Seq Scan on boolpart_t
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
+ -> Seq Scan on boolpart_default
+ Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
+(7 rows)
+
+explain (costs off) select * from boolpart where a is unknown;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a IS UNKNOWN)
+ -> Seq Scan on boolpart_t
+ Filter: (a IS UNKNOWN)
+ -> Seq Scan on boolpart_default
+ Filter: (a IS UNKNOWN)
+(7 rows)
+
+explain (costs off) select * from boolpart where a is not unknown;
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on boolpart_f
+ Filter: (a IS NOT UNKNOWN)
+ -> Seq Scan on boolpart_t
+ Filter: (a IS NOT UNKNOWN)
+ -> Seq Scan on boolpart_default
+ Filter: (a IS NOT UNKNOWN)
+(7 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1a3ac4c1f9..edf5a93032 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -60,7 +60,7 @@ test: create_index create_view
# ----------
# Another group of parallel tests
# ----------
-test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func
+test: create_aggregate create_function_3 create_cast constraints triggers inherit create_table_like typed_table vacuum drop_if_exists updatable_views rolenames roleattributes create_am hash_func partition
# ----------
# sanity_check does a vacuum, affecting the sort order of SELECT *
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index a205e5d05c..8b609778b3 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -71,6 +71,7 @@ test: create_cast
test: constraints
test: triggers
test: inherit
+test: partition
test: create_table_like
test: typed_table
test: vacuum
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
new file mode 100644
index 0000000000..9dfcbe1e70
--- /dev/null
+++ b/src/test/regress/sql/partition.sql
@@ -0,0 +1,155 @@
+--
+-- Test partitioning planner code
+--
+create table lp (a char) partition by list (a);
+create table lp_default partition of lp default;
+create table lp_ef partition of lp for values in ('e', 'f');
+create table lp_ad partition of lp for values in ('a', 'd');
+create table lp_bc partition of lp for values in ('b', 'c');
+create table lp_g partition of lp for values in ('g');
+create table lp_null partition of lp for values in (null);
+explain (costs off) select * from lp;
+explain (costs off) select * from lp where a > 'a' and a < 'd';
+explain (costs off) select * from lp where a > 'a' and a <= 'd';
+explain (costs off) select * from lp where a = 'a';
+explain (costs off) select * from lp where 'a' = a; /* commutates */
+explain (costs off) select * from lp where a is not null;
+explain (costs off) select * from lp where a is null;
+explain (costs off) select * from lp where a = 'a' or a = 'c';
+explain (costs off) select * from lp where a is not null and (a = 'a' or a = 'c');
+explain (costs off) select * from lp where a <> 'g';
+explain (costs off) select * from lp where a <> 'a' and a <> 'd';
+explain (costs off) select * from lp where a not in ('a', 'd');
+
+-- collation matches the partitioning collation, pruning works
+create table coll_pruning (a text collate "C") partition by list (a);
+create table coll_pruning_a partition of coll_pruning for values in ('a');
+create table coll_pruning_b partition of coll_pruning for values in ('b');
+create table coll_pruning_def partition of coll_pruning default;
+explain (costs off) select * from coll_pruning where a collate "C" = 'a' collate "C";
+-- collation doesn't match the partitioning collation, no pruning occurs
+explain (costs off) select * from coll_pruning where a collate "POSIX" = 'a' collate "POSIX";
+
+create table rlp (a int, b varchar) partition by range (a);
+create table rlp_default partition of rlp default partition by list (a);
+create table rlp_default_default partition of rlp_default default;
+create table rlp_default_10 partition of rlp_default for values in (10);
+create table rlp_default_30 partition of rlp_default for values in (30);
+create table rlp_default_null partition of rlp_default for values in (null);
+create table rlp1 partition of rlp for values from (minvalue) to (1);
+create table rlp2 partition of rlp for values from (1) to (10);
+
+create table rlp3 (b varchar, a int) partition by list (b varchar_ops);
+create table rlp3_default partition of rlp3 default;
+create table rlp3abcd partition of rlp3 for values in ('ab', 'cd');
+create table rlp3efgh partition of rlp3 for values in ('ef', 'gh');
+create table rlp3nullxy partition of rlp3 for values in (null, 'xy');
+alter table rlp attach partition rlp3 for values from (15) to (20);
+
+create table rlp4 partition of rlp for values from (20) to (30) partition by range (a);
+create table rlp4_default partition of rlp4 default;
+create table rlp4_1 partition of rlp4 for values from (20) to (25);
+create table rlp4_2 partition of rlp4 for values from (25) to (29);
+
+create table rlp5 partition of rlp for values from (31) to (maxvalue) partition by range (a);
+create table rlp5_default partition of rlp5 default;
+create table rlp5_1 partition of rlp5 for values from (31) to (40);
+
+explain (costs off) select * from rlp where a < 1;
+explain (costs off) select * from rlp where 1 > a; /* commutates */
+explain (costs off) select * from rlp where a <= 1;
+explain (costs off) select * from rlp where a = 1;
+explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
+explain (costs off) select * from rlp where a <= 10;
+explain (costs off) select * from rlp where a > 10;
+explain (costs off) select * from rlp where a < 15;
+explain (costs off) select * from rlp where a <= 15;
+explain (costs off) select * from rlp where a > 15 and b = 'ab';
+explain (costs off) select * from rlp where a = 16;
+explain (costs off) select * from rlp where a = 16 and b in ('not', 'in', 'here');
+explain (costs off) select * from rlp where a = 16 and b < 'ab';
+explain (costs off) select * from rlp where a = 16 and b <= 'ab';
+explain (costs off) select * from rlp where a = 16 and b is null;
+explain (costs off) select * from rlp where a = 16 and b is not null;
+explain (costs off) select * from rlp where a is null;
+explain (costs off) select * from rlp where a is not null;
+explain (costs off) select * from rlp where a > 30;
+explain (costs off) select * from rlp where a = 30; /* only default is scanned */
+explain (costs off) select * from rlp where a <= 31;
+explain (costs off) select * from rlp where a = 1 or a = 7;
+explain (costs off) select * from rlp where a = 1 or b = 'ab';
+
+explain (costs off) select * from rlp where a > 20 and a < 27;
+explain (costs off) select * from rlp where a = 29;
+explain (costs off) select * from rlp where a >= 29;
+
+-- redundant clauses are eliminated
+explain (costs off) select * from rlp where a > 1 and a = 10; /* only default */
+explain (costs off) select * from rlp where a > 1 and a >=15; /* rlp3 onwards, including default */
+explain (costs off) select * from rlp where a = 1 and a = 3; /* empty */
+explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a = 15);
+
+-- multi-column keys
+create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
+create table mc3p_default partition of mc3p default;
+create table mc3p0 partition of mc3p for values from (minvalue, minvalue, minvalue) to (1, 1, 1);
+create table mc3p1 partition of mc3p for values from (1, 1, 1) to (10, 5, 10);
+create table mc3p2 partition of mc3p for values from (10, 5, 10) to (10, 10, 10);
+create table mc3p3 partition of mc3p for values from (10, 10, 10) to (10, 10, 20);
+create table mc3p4 partition of mc3p for values from (10, 10, 20) to (10, maxvalue, maxvalue);
+create table mc3p5 partition of mc3p for values from (11, 1, 1) to (20, 10, 10);
+create table mc3p6 partition of mc3p for values from (20, 10, 10) to (20, 20, 20);
+create table mc3p7 partition of mc3p for values from (20, 20, 20) to (maxvalue, maxvalue, maxvalue);
+
+explain (costs off) select * from mc3p where a = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
+explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
+explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
+explain (costs off) select * from mc3p where a > 10;
+explain (costs off) select * from mc3p where a >= 10;
+explain (costs off) select * from mc3p where a < 10;
+explain (costs off) select * from mc3p where a <= 10 and abs(b) < 10;
+explain (costs off) select * from mc3p where a = 11 and abs(b) = 0; /* empty */
+explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
+explain (costs off) select * from mc3p where a > 20;
+explain (costs off) select * from mc3p where a >= 20;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
+explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
+explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 9);
+
+-- a simpler multi-column keys case
+create table mc2p (a int, b int) partition by range (a, b);
+create table mc2p_default partition of mc2p default;
+create table mc2p0 partition of mc2p for values from (minvalue, minvalue) to (1, minvalue);
+create table mc2p1 partition of mc2p for values from (1, minvalue) to (1, 1);
+create table mc2p2 partition of mc2p for values from (1, 1) to (2, minvalue);
+create table mc2p3 partition of mc2p for values from (2, minvalue) to (2, 1);
+create table mc2p4 partition of mc2p for values from (2, 1) to (2, maxvalue);
+create table mc2p5 partition of mc2p for values from (2, maxvalue) to (maxvalue, maxvalue);
+
+explain (costs off) select * from mc2p where a < 2;
+explain (costs off) select * from mc2p where a = 2 and b < 1;
+explain (costs off) select * from mc2p where a > 1;
+explain (costs off) select * from mc2p where a = 1 and b > 1;
+
+-- boolean partitioning
+create table boolpart (a bool) partition by list (a);
+create table boolpart_default partition of boolpart default;
+create table boolpart_t partition of boolpart for values in ('true');
+create table boolpart_f partition of boolpart for values in ('false');
+
+explain (costs off) select * from boolpart where a in (true, false);
+explain (costs off) select * from boolpart where a = false;
+explain (costs off) select * from boolpart where not a = false;
+explain (costs off) select * from boolpart where a is true or a is not true;
+explain (costs off) select * from boolpart where a is not true;
+explain (costs off) select * from boolpart where a is not true and a is not false;
+explain (costs off) select * from boolpart where a is unknown;
+explain (costs off) select * from boolpart where a is not unknown;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
--
2.11.0
0004-Add-a-bms_add_range-v13.patchtext/plain; charset=UTF-8; name=0004-Add-a-bms_add_range-v13.patchDownload
From 9d57121110c5d9eb2484b0b20523e6d7f505571c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 14 Nov 2017 16:05:52 +0900
Subject: [PATCH 4/8] Add a bms_add_range()
Authors: David Rowley, Kyotaro Horiguchi
---
src/backend/nodes/bitmapset.c | 72 +++++++++++++++++++++++++++++++++++++++++++
src/include/nodes/bitmapset.h | 1 +
2 files changed, 73 insertions(+)
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index d4b82c6305..e5096e01a7 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -785,6 +785,78 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
}
/*
+ * bms_add_range
+ * Add members in the range of 'lower' to 'upper' to the set.
+ *
+ * Note this could also be done by calling bms_add_member in a loop, however,
+ * using this function will be faster when the range is large as we work with
+ * at the bitmapword level rather than at bit level.
+ */
+Bitmapset *
+bms_add_range(Bitmapset *a, int lower, int upper)
+{
+ int lwordnum,
+ lbitnum,
+ uwordnum,
+ ushiftbits,
+ wordnum;
+
+ if (lower < 0 || upper < 0)
+ elog(ERROR, "negative bitmapset member not allowed");
+ if (lower > upper)
+ elog(ERROR, "lower range must not be above upper range");
+ uwordnum = WORDNUM(upper);
+
+ if (a == NULL)
+ {
+ a = (Bitmapset *) palloc0(BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ }
+
+ /* ensure we have enough words to store the upper bit */
+ else if (uwordnum >= a->nwords)
+ {
+ int oldnwords = a->nwords;
+ int i;
+
+ a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(uwordnum + 1));
+ a->nwords = uwordnum + 1;
+ /* zero out the enlarged portion */
+ for (i = oldnwords; i < a->nwords; i++)
+ a->words[i] = 0;
+ }
+
+ wordnum = lwordnum = WORDNUM(lower);
+
+ lbitnum = BITNUM(lower);
+ ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1);
+
+ /*
+ * Special case when lwordnum is the same as uwordnum we must perform the
+ * upper and lower masking on the word.
+ */
+ if (lwordnum == uwordnum)
+ {
+ a->words[lwordnum] |= ~(bitmapword) (((bitmapword) 1 << lbitnum) - 1)
+ & (~(bitmapword) 0) >> ushiftbits;
+ }
+ else
+ {
+ /* turn on lbitnum and all bits left of it */
+ a->words[wordnum++] |= ~(bitmapword) (((bitmapword) 1 << lbitnum) - 1);
+
+ /* turn on all bits for any intermediate words */
+ while (wordnum < uwordnum)
+ a->words[wordnum++] = ~(bitmapword) 0;
+
+ /* turn on upper's bit and all bits right of it. */
+ a->words[uwordnum] |= (~(bitmapword) 0) >> ushiftbits;
+ }
+
+ return a;
+}
+
+/*
* bms_int_members - like bms_intersect, but left input is recycled
*/
Bitmapset *
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index aa3fb253c2..3b62a97775 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -90,6 +90,7 @@ extern bool bms_is_empty(const Bitmapset *a);
extern Bitmapset *bms_add_member(Bitmapset *a, int x);
extern Bitmapset *bms_del_member(Bitmapset *a, int x);
extern Bitmapset *bms_add_members(Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_add_range(Bitmapset *a, int lower, int upper);
extern Bitmapset *bms_int_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_del_members(Bitmapset *a, const Bitmapset *b);
extern Bitmapset *bms_join(Bitmapset *a, Bitmapset *b);
--
2.11.0
0005-Planner-side-changes-for-partition-pruning-v13.patchtext/plain; charset=UTF-8; name=0005-Planner-side-changes-for-partition-pruning-v13.patchDownload
From 6e299912c32d55b9ce63f7fc07dddb99ec6df97d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 5/8] Planner-side changes for partition-pruning
This adds all the necessary planner code and representations viz.
0. Code to teach set_append_rel_size/pathlist to look at only
the *live* partitions of partitioned tables.
1. Add a field partcollation to PartitionScheme, which will be
needed to verify that a operator clause's input collation
indeed matches what is used for partitioning, to be able
to use the clause for partition-pruning (using parttypcoll
won't be correct, because that's not what's used by
partitioning)
2. Code to match the clauses to the table's partition key and
generate a list of such matching clauses.
3. Add a field to RelOptInfo to store an array of pointers of
AppendRelInfo of *all* partitions (stored in the same order as
their RelOptInfos in part_rels)
4. Add a field to RelOptInfo to store a list of AppendRelInfos
of *live* partitions that survived partition-pruning (although
as of this commit this contains *all* appinfos as mentioned
below).
5. Some code in try_partition_wise_join in to handle the
possibility that a partition RelOptInfo may not have the basic
information set (note that as noted in 0, set_append_rel_size
now sets such information for only the *live* partitions)
If the clauses identified in 2 above does not contain values
necessary to perform partition pruning, get_partitions_from_clauses
would returns without pruning any partitions. In most cases, it's
obvious in the planner that a set of clauses identified as matching
the partition key don't contain the constant values right away, in
which case, there is no need to call get_partitions_from_clauses
right away. Instead, it should be deferred to another piece of code
which can receive the above list of clauses and runs at a time when
the constant values become available.
In addition, a stub function get_partitions_from_clauses is added in
partition.c, which currently simply returns all partitions from the
partition descriptor.
Authors: Amit Langote, Dilip Kumar
---
src/backend/catalog/partition.c | 20 ++
src/backend/optimizer/path/allpaths.c | 608 +++++++++++++++++++++++++++-------
src/backend/optimizer/path/indxpath.c | 3 -
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/plan/planner.c | 19 +-
src/backend/optimizer/util/plancat.c | 4 +
src/backend/optimizer/util/relnode.c | 101 ++++++
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 29 +-
src/include/optimizer/pathnode.h | 4 +
11 files changed, 685 insertions(+), 133 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e032c11ed4..a8ddd4fab2 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1529,6 +1529,26 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_using_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+
+ Assert(partclauses != NIL);
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ return result;
+}
+
/* Module-local functions */
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 906d08ab37..6b087ec15f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,9 +20,12 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
+#include "catalog/pg_type.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -135,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -846,6 +857,399 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * in rel->baserestrictinfo. An empty list is returned if no matching
+ * partitions were found.
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *partclauses;
+ bool contains_const,
+ constfalse;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(root, rel,
+ list_copy(rel->baserestrictinfo),
+ &contains_const,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ Relation parent = heap_open(rte->relid, NoLock);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ Bitmapset *partindexes;
+ List *result = NIL;
+ int i;
+
+ /*
+ * If we have matched clauses that contain at least one constant
+ * operand, then use these to prune partitions.
+ */
+ if (partclauses != NIL && contains_const)
+ partindexes = get_partitions_from_clauses(parent, rel->relid,
+ partclauses);
+
+ /*
+ * Else there are no clauses that are useful to prune any paritions,
+ * so we must scan all partitions.
+ */
+ else
+ partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+
+ result = lappend(result, appinfo);
+ }
+
+ /* Record which partitions must be scanned. */
+ rel->live_part_appinfos = result;
+
+ heap_close(parent, NoLock);
+
+ return result;
+ }
+
+ return NIL;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause
+ * must be an operator clause of the form (partkey op const) or (const op
+ * partkey); the latter only if a suitable commutator exists. Furthermore,
+ * the operator must be strict and its input collation must match the partition
+ * collation. The aforementioned "const" means any expression that doesn't
+ * involve a volatile function or a Var of this relation. We allow Vars
+ * belonging to other relations (for example, if the clause is a join clause),
+ * but they are treated as parameters whose values are not known now, so cannot
+ * be used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join clauses
+ * appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ *
+ * If clauses contains at least one constant operand or a Nullness test,
+ * *contains_const is set so that the caller can pass the clauses to the
+ * partitioning module right away.
+ *
+ * If the list contains a pseudo-constant RestrictInfo with constant false
+ * value, *constfalse is set.
+ */
+static List *
+match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Node *member = lfirst(lc);
+ Expr *clause;
+ int i;
+
+ if (IsA(member, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) member;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+ else
+ clause = (Expr *) member;
+
+ /*
+ * For a BoolExpr, we should try to match each of its args with the
+ * partition key as described below for each type.
+ */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ /*
+ * For each of OR clause's args, call this function
+ * recursively with a given arg as the only member in the
+ * input list and see if it's returned as matching the
+ * partition key. Add the OR clause to the result iff at
+ * least one of its args contain a matching clause.
+ */
+ BoolExpr *orclause = (BoolExpr *) clause;
+ ListCell *lc1;
+ bool arg_matches_key = false,
+ matched_arg_contains_const = false,
+ all_args_constfalse = true;
+
+ foreach (lc1, orclause->args)
+ {
+ Node *arg = lfirst(lc1);
+ bool contains_const1,
+ constfalse1;
+
+ if (match_clauses_to_partkey(root, rel, list_make1(arg),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ arg_matches_key = true;
+ matched_arg_contains_const = contains_const1;
+ }
+
+ /* We got at least one arg that is not constant false. */
+ if (!constfalse1)
+ all_args_constfalse = false;
+ }
+
+ if (arg_matches_key)
+ {
+ result = lappend(result, clause);
+ *contains_const = matched_arg_contains_const;
+ }
+
+ /* OR clause is "constant false" if all of its args are. */
+ *constfalse = all_args_constfalse;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Since the clause is itself implicitly ANDed with other
+ * clauses in the input list, queue the args to be processed
+ * later as if they were part of the original input list.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the clauses matches the partition key and add it to
+ * the result list if other things such as operator input
+ * collation, strictness, etc. look fine.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning.
+ */
+ result = lappend(result, clause);
+
+ if (!*contains_const)
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* OK to add to the result. */
+ result = lappend(result, clause);
+ if (IsA(estimate_expression_value(root, rightop), Const))
+ *contains_const = true;
+ else
+ *contains_const = false;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* A Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Certain Boolean conditions have a special shape, which we
+ * accept if the partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ /*
+ * Only accept those for pruning that appear to be
+ * IS [NOT] TRUE/FALSE.
+ */
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+ Expr *arg = btest->arg;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN &&
+ equal((Node *) arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var))
+ {
+ if (equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ }
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -860,6 +1264,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -873,6 +1278,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -893,7 +1315,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -906,10 +1328,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -920,85 +1338,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
@@ -1164,6 +1508,19 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
has_live_children = true;
/*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel.
+ * Note that since rel itself (the parent) might just be a union all
+ * subquery, in which case, there is nothing to do here.
+ */
+ if (IS_PARTITIONED_REL(childrel) && IS_PARTITIONED_REL(rel))
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
+ /*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
* in which some children are farmed out to workers while others are
@@ -1259,14 +1616,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1337,43 +1709,40 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
+ /*
+ * AppendPath we are about to generate must record the RT indexes of
+ * partitioned tables that are direct or indirect children of this Append
+ * rel. For partitioned tables, we collect its live partitioned children
+ * from rel->painfo. However, it will contain only its immediate children,
+ * so collect live partitioned children from all children that are
+ * themselves partitioned and concatenate to our list before finally
+ * passing the list to create_append_path() and/or
+ * generate_mergeappend_paths().
+ *
+ * If this is a sub-query RTE, its RelOptInfo doesn't itself contain the
+ * list of live partitioned children, so we must assemble the same in the
+ * loop below from the children that are known to correspond to
+ * partitioned rels. (This assumes that we don't need to look through
+ * multiple levels of subquery RTEs; if we ever do, we could consider
+ * stuffing the list we generate here into sub-query RTE's RelOptInfo, just
+ * like we do for partitioned rels, which would be used when populating our
+ * parent rel with paths. For the present, that appears to be
+ * unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of root partitioned tables, get
+ * partitioned_rels list by combining live_partitioned_rels of the
+ * component partitioned tables.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1390,17 +1759,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *lcp;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 18f6bafcdd..32de48128f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -40,9 +40,6 @@
#include "utils/selfuncs.h"
-#define IsBooleanOpfamily(opfamily) \
- ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
-
#define IndexCollMatchesExprColl(idxcollation, exprcollation) \
((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 453f25964a..b491fb9099 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1396,6 +1396,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f6b8bbf5fa..aaa342c2ab 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6192,14 +6192,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 9d35a41e22..e1ef936e68 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1918,6 +1918,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index cb94c318a7..a968fa4586 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -18,6 +18,7 @@
#include "miscadmin.h"
#include "catalog/partition.h"
+#include "catalog/pg_class.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/pathnode.h"
@@ -154,9 +155,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +237,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +266,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +576,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +744,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -1746,3 +1757,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 295e9d224e..2041de5bca 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -71,4 +71,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 0d0ba7c66a..f2fddeceb8 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -187,4 +187,7 @@ DATA(insert OID = 4082 ( 3580 pg_lsn_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9e68e65cc6..94c2e8d011 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
@@ -529,6 +534,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +663,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e9ed16ad32..c1f2fc93cd 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -296,5 +296,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0006-Implement-get_partitions_from_clauses-v13.patchtext/plain; charset=UTF-8; name=0006-Implement-get_partitions_from_clauses-v13.patchDownload
From 28a377179d060067e48d4b57768a339c6cb10185 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 6/8] Implement get_partitions_from_clauses
This now actually processes partclauses and classifies them into
a set of keys that can be used to look up partitions in the
partition descriptor, although there is still no support for the
latter.
---
src/backend/catalog/partition.c | 1205 ++++++++++++++++++++++++++++++-
src/backend/optimizer/util/clauses.c | 4 +-
src/include/optimizer/clauses.h | 2 +
src/test/regress/expected/partition.out | 15 +-
4 files changed, 1209 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index a8ddd4fab2..8a7d305357 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -131,6 +135,69 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ *
+ * Equal keys are not required to be in any particular order, unlike the
+ * keys below which must appear in the same order as partition keys.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Does the query specify a key to be null or not null? Partitioning
+ * handles null partition keys specially depending on the partitioning
+ * method in use, we store this information.
+ */
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -178,6 +245,25 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static int32 partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1530,7 +1616,7 @@ get_partition_qual_relid(Oid relid)
}
/*
- * get_partitions_using_clauses
+ * get_partitions_from_clauses
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
*
@@ -1541,17 +1627,1128 @@ Bitmapset *
get_partitions_from_clauses(Relation relation, int rt_index,
List *partclauses)
{
- PartitionDesc partdesc = RelationGetPartitionDesc(relation);
- Bitmapset *result = NULL;
+ Bitmapset *result;
+ List *partconstr = RelationGetPartitionQual(relation);
Assert(partclauses != NIL);
- result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ /*
+ * If relation is a partition itself, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partconstr)
+ {
+ PartitionBoundInfo boundinfo =
+ RelationGetPartitionDesc(relation)->boundinfo;
+
+ /*
+ * We need to worry about such a case only if the relation has a
+ * default partition to begin with.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ }
+ }
+
+ result = get_partitions_from_clauses_recurse(relation, rt_index,
+ partclauses);
+
return result;
}
/* Module-local functions */
/*
+ * get_partitions_from_clauses_guts
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ /*
+ * Reduce the set of clauses into a form that get_partitions_for_keys()
+ * can work with.
+ */
+ nkeys = classify_partition_bounding_keys(relation, clauses, rt_index,
+ &keys, &constfalse,
+ &or_clauses);
+
+ /*
+ * The analysis of the matched clauses done by
+ * classify_partition_bounding_keys may have found mutually contradictory
+ * clauses.
+ */
+ if (!constfalse)
+ {
+ /*
+ * If all clauses in the list were OR clauses,
+ * classify_partition_bounding_keys() wouldn't have formed keys
+ * yet. They will be handled below by recursively calling this
+ * function for each of OR clauses' arguments and combining the
+ * resulting partition sets appropriately.
+ */
+ if (nkeys > 0)
+ result = get_partitions_for_keys(relation, &keys);
+ else
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ }
+ else
+ return NULL;
+
+ /* No point in trying to look at other conjunctive clauses. */
+ if (bms_is_empty(result))
+ return NULL;
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ Bitmapset *or_partset = NULL;
+
+ foreach(lc1, or->args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc1));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation
+ * due to the latter's partition constraint, which means we must
+ * not add its partitions to or_partset. But the clause may not
+ * contain this relation's partition key expressions (instead the
+ * parent's), so we could not depend on just calling
+ * get_partitions_from_clauses_recurse(relation, ...) to determine
+ * that the clause indeed prunes all of the relation's partition.
+ *
+ * Use predicate refutation proof instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation,
+ rt_index,
+ arg_clauses);
+
+ /*
+ * Partition sets obtained from mutually-disjunctive clauses are
+ * combined using set union.
+ */
+ or_partset = bms_union(or_partset, arg_partset);
+ }
+
+ /*
+ * Partition sets obtained from mutually-conjunctive clauses are
+ * combined using set intersection.
+ */
+ result = bms_intersect(result, or_partset);
+ }
+
+ return result;
+}
+
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) ||\
+ equal((expr), (partexpr)))
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, and max keys, along with
+ * any Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max
+ * bounds. For example, of a > 1, a > 2, and a >= 5, "5" is the best min
+ * bound for the column a, which also happens to be an inclusive bound.
+ * When analyzing multiple clauses referencing the same key, it is checked
+ * if there are mutually contradictory clauses and if so, we set *constfalse
+ * to true to indicate to the caller that the set of clauses cannot be true
+ * for any partition. It is also set if the list already contains a
+ * pseudo-constant clause.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by clauses containing equality operator, unless hash
+ * partitioning is in use, in which case, it's possible that some keys have
+ * IS NULL clauses while remaining have clauses with equality operator.
+ * Min and max bounds could contain bound values for only a prefix of keys.
+ *
+ * All the OR clauses encountered in the list and those generated from certain
+ * ScalarArrayOpExprs are added to *or_clauses. It's the responsibility of the
+ * caller to process the argument clauses of each of the OR clauses, which
+ * would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_or_clauses = true;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, sizeof(keyclauses_all));
+ /* false means we don't know if a given key is null */
+ memset(keyisnull, false, sizeof(keyisnull));
+ /* false means we don't know if a given key is not null */
+ memset(keyisnotnull, false, sizeof(keyisnull));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i],
+ partcoll = partkey->partcollation[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ /* Set partexpr if needed. */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+ partexpr = copyObject(lfirst(partexprs_item));
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ constexpr = leftop;
+ else
+ /* Clause not meant for this column. */
+ continue;
+
+ /*
+ * Handle some cases wherein the clause's operator may not
+ * belong to the partitioning operator family. For example,
+ * operators named '<>' are not listed in any operator
+ * family whatsoever. Also, ordering opertors like '<' are
+ * not listed in the hash operator family.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Expr *ltexpr,
+ *gtexpr;
+ Oid negator,
+ ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is '<>', check if its
+ * negator is an equality operator. If so and it's a btree
+ * equality operator, we can use a special trick to prune
+ * partitions that won't satisfy the original '<>'
+ * operator -- we generate an OR expression
+ * 'leftop < rightop OR leftop > rightop' and add it to
+ * *or_clauses.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ {
+ Expr *or;
+
+ ltop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop,
+ (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop,
+ (Expr *) rightop,
+ InvalidOid, partcoll);
+ or = makeBoolExpr(OR_EXPR,
+ list_make2(ltexpr, gtexpr), -1);
+ *or_clauses = lappend(*or_clauses, or);
+ continue;
+ }
+ }
+
+ /*
+ * Getting here means opclause uses an ordering op and
+ * hash partitioning is in use. We shouldn't try to
+ * reason about such an operator for the purposes of
+ * partition pruning, because hash partitioning doesn't
+ * make partitioning decisions based on relative ordering
+ * of keys.
+ */
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * Flip the left and right args if we have to, because the
+ * code which extract the constant value to use for
+ * partition-pruning expects to find it as the rightop of the
+ * clause. (See below in this function.)
+ */
+ if (constexpr == rightop)
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = get_commutator(opclause->opno);
+ commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ only_or_clauses = false;
+
+ /*
+ * Since we only allow strict operators, require keys to be
+ * not null.
+ */
+ keyisnotnull[i] = true;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) &&
+ ((Var *) arg)->varattno == partattno) ||
+ equal(arg, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ keyisnull[i] = true;
+ else
+ keyisnotnull[i] = true;
+ n_keynullness++;
+ only_or_clauses = false;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ only_or_clauses = false;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_or_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Try to eliminate redundant keys. In the process, we might find out
+ * that clauses are mutually contradictory and hence can never be true
+ * for any rows.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i], &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ int32 op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ if (op_strategy < 0 &&
+ need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ else if (op_strategy == 0)
+ {
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ }
+ else if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found the same for partition key columns.
+ * If present, we don't need minkeys and maxkeys anymore. In the case
+ * of hash partitioning, we don't require all equal keys to be operator
+ * clauses. For hash partitioning, any IS NULL clauses are considered
+ * as equal keys by the code performing actual pruning, at which time it
+ * is checked whether, along with any operator clauses, all partition key
+ * columns are covered.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ keys->keyisnull[i] = keyisnull[i];
+ keys->keyisnotnull[i] = keyisnotnull[i];
+ }
+
+ return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
+}
+
+/*
+ * Returns -1, 0, or 1 to signify that the partitioning clause has a </<=,
+ * =, and >/>= operator, respectively. Sets *incl to true if equality is
+ * implied.
+ */
+static int32
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ int32 result;
+
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = 0;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ result = -1;
+ *incl = (op->op_strategy == BTLessEqualStrategyNumber);
+ break;
+ case BTEqualStrategyNumber:
+ result = 0;
+ *incl = true;
+ break;
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ result = 1;
+ *incl = (op->op_strategy == BTGreaterEqualStrategyNumber);
+ break;
+ }
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partattoff])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partattoff], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ /*
+ * If couldn't coerce to the partition key type, that is, the type of
+ * datums stored in PartitionBoundInfo, no hope of using this
+ * expression for anything partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+/*
+ * For a given partition key column, find the most restrictive of the clauses
+ * contained in all_clauses that are known to match the column. If in the
+ * process, it is found that two clauses are mutually contradictory, we simply
+ * stop, set *constfalse to true, and return.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey, int partattoff,
+ List *all_clauses, List **result,
+ bool *constfalse)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ hash_clause = NULL;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[partattoff],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've matched
+ * a clause and found another whose constant operand doesn't match
+ * the constant operand of the former, we have a case of mutually
+ * contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, partattoff,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value and
+ * so add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with the same. It's possible that mutual
+ * contradiction is proved at some higher level, but it's just
+ * that we couldn't do so here.
+ */
+ else
+ *result = lappend(*result, cur);
+
+ /* The code below is for btree operators, which cur is not. */
+ continue;
+ }
+
+ /*
+ * Stuff that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points to the currently best scan key of strategy
+ * type s+1; it is NULL if we haven't yet found such a key for this
+ * attr.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, replace old key. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+
+ /* The old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ *result = lappend(*result, hash_clause);
+ return;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equal key with keys of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq key is
+ * a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq key is a = 3, then because 3 < 5, we no longer need a < 5,
+ * because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff, ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the result.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ if (btree_clauses[s])
+ *result = lappend(*result, btree_clauses[s]);
+}
+
+/*
+ * Evaluate 'leftarg op rightarg' and set *result to its value.
+ *
+ * leftarg and rightarg referred to above actually refer to the constant
+ * operand (Datum) of the clause contained in the parameters leftarg and
+ * rightarg below, respectively. And op refers to the operator of the
+ * clause contained in the parameter op below.
+ *
+ * Returns true if we could actually perform the evaluation. False is
+ * returned otherwise, that is, in cases where we couldn't perform the
+ * evaluation for reasons such as operands values being unavailable or
+ * types of operands being incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partattoff];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partattoff,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partattoff,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg_const and rightarg_consr are both of the type expected
+ * by op's operator, then compare them using the latter.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ Bitmapset *result = NULL;
+
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index d14ef31eae..72f1fa30a6 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -149,8 +149,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4744,7 +4742,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index e3672218f3..1ef13a49de 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index 963561dbfe..d44ff4f608 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -1049,16 +1049,11 @@ explain (costs off) select * from boolpart where a is not true;
(7 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
--
2.11.0
0007-Some-interface-changes-for-partition_bound_-cmp-bsea-v13.patchtext/plain; charset=UTF-8; name=0007-Some-interface-changes-for-partition_bound_-cmp-bsea-v13.patchDownload
From 83ceae462e490d0403ff731c0f94f463745d8bde Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 7/8] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 164 ++++++++++++++++++++++++++++++----------
1 file changed, 122 insertions(+), 42 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8a7d305357..2f4501576a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -198,6 +198,31 @@ typedef struct PartScanKeyInfo
bool keyisnotnull[PARTITION_MAX_KEYS];
} PartScanKeyInfo;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -230,14 +255,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -1066,6 +1092,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -1080,8 +1108,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1154,10 +1188,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1208,6 +1248,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1229,8 +1270,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1244,9 +1288,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -3746,12 +3790,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -3780,11 +3827,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -3992,12 +4043,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -4019,11 +4070,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -4032,25 +4083,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -4061,12 +4142,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -4080,20 +4162,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -4106,8 +4187,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0008-Implement-get_partitions_for_keys-v13.patchtext/plain; charset=UTF-8; name=0008-Implement-get_partitions_for_keys-v13.patchDownload
From c658acad1fea6d0d254cbf3bfea7bdbe017a1eeb Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 18 Oct 2017 17:14:53 +0900
Subject: [PATCH 8/8] Implement get_partitions_for_keys
Disable constraint_exclusion pruning using internal partition
constraints for select queries, because we've now got the new pruning
working.
---
src/backend/catalog/partition.c | 431 +++++++++++++++++++++++++++++++-
src/backend/optimizer/util/plancat.c | 29 ++-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition.out | 345 +++++++++++++++++++++----
src/test/regress/sql/partition.sql | 47 +++-
5 files changed, 788 insertions(+), 72 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 2f4501576a..f07ac1529e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2784,10 +2784,437 @@ partition_cmp_args(PartitionKey key, int partattoff,
static Bitmapset *
get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
{
- PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
Bitmapset *result = NULL;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool hash_isnull[PARTITION_MAX_KEYS];
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return NULL;
+
+ memset(hash_isnull, false, sizeof(hash_isnull));
+ /* Handle null partition keys. */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keyisnull[i])
+ {
+ int other_idx = -1;
+
+ switch (partkey->strategy)
+ {
+ /*
+ * Hash partitioning handles puts nulls into a normal
+ * partition and doesn't require to define a special
+ * null-accpting partition. So, we let this fall through
+ * get handled by the code below that handles equality
+ * keys.
+ */
+ case PARTITION_STRATEGY_HASH:
+ hash_isnull[i] = true;
+ keys->n_eqkeys++;
+ break;
+
+ /*
+ * In range and list partitioning cases, only a designated
+ * partition will accept nulls.
+ */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ result = bms_make_singleton(other_idx);
+ return result;
+ }
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(boundinfo->default_index);
+ return result;
+ }
+
+
+ /*
+ * Determine set of partitions using provided keys, which proceeds in a
+ * manner determined by the partitioning method.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ Assert(keys->n_eqkeys == partkey->partnatts);
+ switch (partkey->strategy)
+ {
+ /* Hash-partitioning is real simple. */
+ case PARTITION_STRATEGY_HASH:
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys,
+ hash_isnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ result = bms_make_singleton(result_index);
+
+ return result;
+ }
+
+ /* Range and list partitioning take a bit more work. */
+
+ case PARTITION_STRATEGY_LIST:
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+ /* For list partition, must exactly match the datum. */
+ if (eqoff >= 0 && !is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (eqoff >= 0)
+ eqoff += 1;
+ break;
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ result = bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(boundinfo->default_index);
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return result;
+ }
+
+ /*
+ * Hash partitioning doesn't understand non-equality conditions, so
+ * return all partitions.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ return result;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case,
+ * set minoff to the index of the leftmost datum, viz. 0.
+ *
+ * If the bound at minoff doesn't exactly match minkey or if
+ * it does but minkey isn't inclusive, move to the bound on
+ * the right.
+ */
+ if (minoff == -1 || !is_equal || !keys->min_incl)
+ minoff++;
+
+ /*
+ * boundinfo->ndatums - 1 is the last valid list partition datums
+ * index.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ minoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ /* 1 more index than datums in this case */
+ maxoff = boundinfo->ndatums;
+ else
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Unlike minoff, we leave maxoff that is set to -1 unchanged,
+ * because it simply means none of the partitions satisfies
+ * maxkeys.
+ *
+ * If the bound at maxoff exactly matches maxkey (is_equal),
+ * but the maxkey is not inclusive, then go to the bound on
+ * left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+
+ /*
+ * maxoff may have become -1, which again means no partition
+ * satisfies the maxkeys.
+ */
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * All datums between those at minoff and maxoff satisfy the
+ * query keys, so add the corresponding partitions to the
+ * result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list
+ * partition. Because list partitions divide the key space
+ * in a discontinuous manner, not all values in the given
+ * range will have a partition assigned.
+ */
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
- result = bms_add_range(result, 0, partdesc->nparts - 1);
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys will nulls are mapped to default
+ * range partition, we must include the default partition
+ * if certain keys could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!keys->keyisnotnull[i])
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ }
+
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
return result;
}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index e1ef936e68..6826f5fc9d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1248,21 +1247,25 @@ get_relation_constraints(PlannerInfo *root,
}
/* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
+
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index fac7b62f9c..5a74151c8f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1713,11 +1713,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1904,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition.out b/src/test/regress/expected/partition.out
index d44ff4f608..400e97eb94 100644
--- a/src/test/regress/expected/partition.out
+++ b/src/test/regress/expected/partition.out
@@ -120,6 +120,8 @@ explain (costs off) select * from lp where a <> 'a' and a <> 'd';
QUERY PLAN
-------------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_ef
@@ -128,12 +130,14 @@ explain (costs off) select * from lp where a <> 'a' and a <> 'd';
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-(9 rows)
+(11 rows)
explain (costs off) select * from lp where a not in ('a', 'd');
QUERY PLAN
------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_bc
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_ef
@@ -142,7 +146,7 @@ explain (costs off) select * from lp where a not in ('a', 'd');
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_default
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-(9 rows)
+(11 rows)
-- collation matches the partitioning collation, pruning works
create table coll_pruning (a text collate "C") partition by list (a);
@@ -208,16 +212,14 @@ explain (costs off) select * from rlp where 1 > a; /* commutates */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +521,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +575,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +649,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +657,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -692,7 +688,9 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) < 1;
Append
-> Seq Scan on mc3p0
Filter: ((a = 1) AND (abs(b) < 1))
-(3 rows)
+ -> Seq Scan on mc3p_default
+ Filter: ((a = 1) AND (abs(b) < 1))
+(5 rows)
explain (costs off) select * from mc3p where a = 1 and abs(b) = 1;
QUERY PLAN
@@ -714,9 +712,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -813,12 +809,14 @@ explain (costs off) select * from mc3p where a = 20 and abs(b) = 10 and c = 100;
(3 rows)
explain (costs off) select * from mc3p where a > 20;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+--------------------------------
Append
-> Seq Scan on mc3p7
Filter: (a > 20)
-(3 rows)
+ -> Seq Scan on mc3p_default
+ Filter: (a > 20)
+(5 rows)
explain (costs off) select * from mc3p where a >= 20;
QUERY PLAN
@@ -844,7 +842,9 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
-(7 rows)
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)))
+(9 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1;
QUERY PLAN
@@ -858,7 +858,9 @@ explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
-> Seq Scan on mc3p5
Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
-(9 rows)
+ -> Seq Scan on mc3p_default
+ Filter: (((a = 1) AND (abs(b) = 1) AND (c = 1)) OR ((a = 10) AND (abs(b) = 5) AND (c = 10)) OR ((a > 11) AND (a < 20)) OR (a < 1))
+(11 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1 and c = 1) or (a = 10 and abs(b) = 5 and c = 10) or (a > 11 and a < 20) or a < 1 or a = 1;
QUERY PLAN
@@ -886,6 +888,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -896,7 +900,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -957,9 +961,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1001,28 +1007,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1032,21 +1030,15 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
QUERY PLAN
@@ -1079,4 +1071,253 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- some more cases
+-- pruning for partitioned table appearing inside a sub-query
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp;
diff --git a/src/test/regress/sql/partition.sql b/src/test/regress/sql/partition.sql
index 9dfcbe1e70..2a623afd2f 100644
--- a/src/test/regress/sql/partition.sql
+++ b/src/test/regress/sql/partition.sql
@@ -152,4 +152,49 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- some more cases
+
+-- pruning for partitioned table appearing inside a sub-query
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp;
--
2.11.0
Hi Amit,
On 11/22/2017 03:59 AM, Amit Langote wrote:
Fixed in the attached. No other changes beside that.
I have been using the following script to look at the patch
-- test.sql --
CREATE TABLE t1 (
a integer NOT NULL,
b integer NOT NULL
) PARTITION BY HASH (b);
CREATE TABLE t1_p00 PARTITION OF t1 FOR VALUES WITH (MODULUS 4,
REMAINDER 0);
CREATE TABLE t1_p01 PARTITION OF t1 FOR VALUES WITH (MODULUS 4,
REMAINDER 1);
CREATE TABLE t1_p02 PARTITION OF t1 FOR VALUES WITH (MODULUS 4,
REMAINDER 2);
CREATE TABLE t1_p03 PARTITION OF t1 FOR VALUES WITH (MODULUS 4,
REMAINDER 3);
CREATE INDEX idx_t1_b_a_p00 ON t1_p00 USING btree (b, a);
CREATE INDEX idx_t1_b_a_p01 ON t1_p01 USING btree (b, a);
CREATE INDEX idx_t1_b_a_p02 ON t1_p02 USING btree (b, a);
CREATE INDEX idx_t1_b_a_p03 ON t1_p03 USING btree (b, a);
CREATE TABLE t2 (
c integer NOT NULL,
d integer NOT NULL
) PARTITION BY HASH (d);
CREATE TABLE t2_p00 PARTITION OF t2 FOR VALUES WITH (MODULUS 4,
REMAINDER 0);
CREATE TABLE t2_p01 PARTITION OF t2 FOR VALUES WITH (MODULUS 4,
REMAINDER 1);
CREATE TABLE t2_p02 PARTITION OF t2 FOR VALUES WITH (MODULUS 4,
REMAINDER 2);
CREATE TABLE t2_p03 PARTITION OF t2 FOR VALUES WITH (MODULUS 4,
REMAINDER 3);
CREATE INDEX idx_t2_c_p00 ON t2_p00 USING btree (c);
CREATE INDEX idx_t2_c_p01 ON t2_p01 USING btree (c);
CREATE INDEX idx_t2_c_p02 ON t2_p02 USING btree (c);
CREATE INDEX idx_t2_c_p03 ON t2_p03 USING btree (c);
CREATE INDEX idx_t2_d_p00 ON t2_p00 USING btree (d);
CREATE INDEX idx_t2_d_p01 ON t2_p01 USING btree (d);
CREATE INDEX idx_t2_d_p02 ON t2_p02 USING btree (d);
CREATE INDEX idx_t2_d_p03 ON t2_p03 USING btree (d);
INSERT INTO t1 (SELECT i, i FROM generate_series(1, 10000) AS i);
INSERT INTO t2 (SELECT i, i FROM generate_series(1, 10000) AS i);
ANALYZE;
EXPLAIN (ANALYZE) SELECT t1.a, t1.b FROM t1 WHERE t1.b = 1;
EXPLAIN (ANALYZE) SELECT t1.a, t1.b, t2.c, t2.d FROM t1 INNER JOIN t2 ON
t2.c = t1.b WHERE t2.d = 1;
BEGIN;
EXPLAIN (ANALYZE) UPDATE t1 SET a = 1 WHERE b = 1;
ROLLBACK;
BEGIN;
EXPLAIN (ANALYZE) DELETE FROM t1 WHERE b = 1;
ROLLBACK;
-- test.sql --
I just wanted to highlight that the "JOIN ON" partition isn't pruned -
the "WHERE" one is.
Should pruning of partitions for UPDATEs (where the partition key isn't
updated) and DELETEs be added to the TODO list ?
Thanks for working on this !
Best regards,
Jesper
Thanks Jesper.
On 2017/11/23 3:56, Jesper Pedersen wrote:
Hi Amit,
On 11/22/2017 03:59 AM, Amit Langote wrote:
Fixed in the attached. No other changes beside that.
I have been using the following script to look at the patch
-- test.sql --
[ ... ]
EXPLAIN (ANALYZE) SELECT t1.a, t1.b, t2.c, t2.d FROM t1 INNER JOIN t2 ON
t2.c = t1.b WHERE t2.d = 1;I just wanted to highlight that the "JOIN ON" partition isn't pruned - the
"WHERE" one is.
Did you mean to write ON t2.d = t1.b? If so, equivalence class mechanism
will give rise to a t1.b = 1 and hence help prune t1's partition as well:
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.b, t2.c, t2.d
FROM t1 INNER JOIN t2 ON t2.d = t1.b
WHERE t2.d = 1;
QUERY PLAN
-----------------------------------------------------------
Nested Loop
-> Append
-> Bitmap Heap Scan on t1_p00
Recheck Cond: (b = 1)
-> Bitmap Index Scan on idx_t1_b_a_p00
Index Cond: (b = 1)
-> Materialize
-> Append
-> Bitmap Heap Scan on t2_p00
Recheck Cond: (d = 1)
-> Bitmap Index Scan on idx_t2_d_p00
Index Cond: (d = 1)
In your original query, you use ON t2.c = t1.b, whereby there is no
"constant" value to perform partition pruning with. t2.c is unknown until
the join actually executes.
BEGIN;
EXPLAIN (ANALYZE) UPDATE t1 SET a = 1 WHERE b = 1;
ROLLBACK;BEGIN;
EXPLAIN (ANALYZE) DELETE FROM t1 WHERE b = 1;
ROLLBACK;Should pruning of partitions for UPDATEs (where the partition key isn't
updated) and DELETEs be added to the TODO list?
Note that partition pruning *does* work for UPDATE and DELETE, but only if
you use list/range partitioning. The reason it doesn't work in this case
(t1 is hash partitioned) is that the pruning is still based on constraint
exclusion in the UPDATE/DELETE case and constraint exclusion cannot handle
hash partitioning.
See example below that uses list partitioning:
drop table t1, t2;
create table t1 (a int, b int) partition by list (b);
create table t1_p0 partition of t1 for values in (0);
create table t1_p1 partition of t1 for values in (1);
create table t1_p2 partition of t1 for values in (2);
create table t1_p3 partition of t1 for values in (3);
create table t2 (c int, d int) partition by list (d);
create table t2_p0 partition of t2 for values in (0);
create table t2_p1 partition of t2 for values in (1);
create table t2_p2 partition of t2 for values in (2);
create table t2_p3 partition of t2 for values in (3);
explain (costs off) update t1 set a = 1 where b = 1;
QUERY PLAN
=------------------------
Update on t1
Update on t1_p1
-> Seq Scan on t1_p1
Filter: (b = 1)
(4 rows)
explain (costs off) delete from t1 where b = 1;
QUERY PLAN
=------------------------
Delete on t1
Delete on t1_p1
-> Seq Scan on t1_p1
Filter: (b = 1)
(4 rows)
I can see how that seems a bit odd. If you use hash partitioning,
UPDATE/DELETE do not benefit from partition-pruning, even though SELECT
does. That's because SELECT uses the new partition-pruning method (this
patch set) which supports hash partitioning, whereas UPDATE and DELETE use
constraint exclusion which doesn't. It would be a good idea to make even
UPDATE and DELETE use the new method thus bringing everyone on the same
page, but that requires us to make some pretty non-trivial changes to how
UPDATE/DELETE planning works for inheritance/partitioned tables, which we
should undertake separately, imho.
Thanks,
Amit
Hi Amit,
On 11/24/2017 12:00 AM, Amit Langote wrote:
On 2017/11/23 3:56, Jesper Pedersen wrote:
EXPLAIN (ANALYZE) SELECT t1.a, t1.b, t2.c, t2.d FROM t1 INNER JOIN t2 ON
t2.c = t1.b WHERE t2.d = 1;I just wanted to highlight that the "JOIN ON" partition isn't pruned - the
"WHERE" one is.Did you mean to write ON t2.d = t1.b? If so, equivalence class mechanism
will give rise to a t1.b = 1 and hence help prune t1's partition as well:
No, I meant 't2.c = t1.b'. If you take the same example, but don't
partition you will get the following plan:
test=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.b, t2.c, t2.d FROM t1 INNER
JOIN t2 ON t2.c = t1.b WHERE t2.d = 1;
QUERY PLAN
----------------------------------------------
Nested Loop
-> Index Scan using idx_t2_d on t2
Index Cond: (d = 1)
-> Index Only Scan using idx_t1_b_a on t1
Index Cond: (b = t2.c)
(5 rows)
Maybe "5.10.2. Declarative Partitioning" could be expanded to include
some general "guidelines" of where partition based plans should be
checked against their non-partition counterparts (at least the first
bullet in 5.10.1 says ".. in certain situations .."). Probably a
separate patch from this.
[snip]
Should pruning of partitions for UPDATEs (where the partition key isn't
updated) and DELETEs be added to the TODO list?Note that partition pruning *does* work for UPDATE and DELETE, but only if
you use list/range partitioning. The reason it doesn't work in this case
(t1 is hash partitioned) is that the pruning is still based on constraint
exclusion in the UPDATE/DELETE case and constraint exclusion cannot handle
hash partitioning.
Thanks for your description.
I can see how that seems a bit odd. If you use hash partitioning,
UPDATE/DELETE do not benefit from partition-pruning, even though SELECT
does. That's because SELECT uses the new partition-pruning method (this
patch set) which supports hash partitioning, whereas UPDATE and DELETE use
constraint exclusion which doesn't. It would be a good idea to make even
UPDATE and DELETE use the new method thus bringing everyone on the same
page, but that requires us to make some pretty non-trivial changes to how
UPDATE/DELETE planning works for inheritance/partitioned tables, which we
should undertake separately, imho.
Agreed.
Best regards,
Jesper
Hello,
At Wed, 22 Nov 2017 17:59:48 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <df609168-b7fd-4c0b-e9b2-6e398d411e27@lab.ntt.co.jp>
Thanks Rajkumar for the test.
On 2017/11/21 19:06, Rajkumar Raghuwanshi wrote:
explain select * from hp_tbl where a = 2;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>It seems I wrote an Assert in the code to support hash partitioning that
wasn't based on a valid assumption. I was wrongly assuming that all hash
partitions for a given modulus (largest modulus) must exist at any given
time, but that isn't the case.Fixed in the attached. No other changes beside that.
0001 and 0002 are under discussion with Robert in another thread.
I don't have a comment on 0003, 0004.
0005:
get_partitions_from_clauses is written as _using_ in it's
comment. (also get_partitions_from_clauses_recurse is _guts in
its comment.)
get_append_rel_partitions just returns NIL if constfalse. I
suppose we'd better reducing indentation level
here. get_partitions_from_clauses_recurse in 0006 does the same
thing.
In the same function, there's a else clause separated from then
clause by a multiline comment. It seems better that the else
clause has braces and the comment is in the braces like the
following.
else
{
/*
* Else there are no clauses....
*/
partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
}
0006:
In get_partitions_from_clauses_recurse, the following comment
seems confusing.
+ /*
+ * The analysis of the matched clauses done by
+ * classify_partition_bounding_keys may have found mutually contradictory
+ * clauses.
+ */
constfalse = true also when the cluase was just one false pseudo
constant restrictinfo.
+ if (!constfalse)
+ {
+ /*
+ * If all clauses in the list were OR clauses,
+ * classify_partition_bounding_keys() wouldn't have formed keys
+ * yet. They will be handled below by recursively calling this
+ * function for each of OR clauses' arguments and combining the
+ * resulting partition sets appropriately.
+ */
+ if (nkeys > 0)
classify_p_b_keys() to return zero also when is not only all-OR
clauses(all AND clauses with volatile function also returns
zero).
+ /* Set partexpr if needed. */
+ if (partattno == 0)
Could you add a description about the meaning of 0 to the
comment of PartitionKeyData something like this?
| AttrNumber *partattrs; /* attribute numbers of columns in the
| * partition key. 0 means partexpression */
+ #define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) ||\
+ equal((expr), (partexpr)))
...
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
partattno = 0 has a different meaning than ordinary attnos.
I belive that the leftop cannot be a whole row var, but I
suppose we should make it clear. Likewise, even it doesn't
actually happen but it formally has a chance to make a false
match since partexpr is not cleared when partattno > 0.
EXPR_MATCHES_PARTKEY might be better be something like follows.
| #define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
| ((partattno) != 0 ? \
| (IsA((expr), Var) && ((Var *) (expr))->varattno == (partattno)) :\
| equal((expr), (partexpr)))
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
...
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
This seems to me to be a bit too much relying on the specific
relationship of the access methods' property. Isn't it
reasonable that add checking that partkey->strategy != 'h'
before getting negator?
+ commuted->opno = get_commutator(opclause->opno);
Im afraid that get_commutator can return InvalidOid for
user-defined types or by user-defined operator class or perhaps
other reasons uncertain to me. match_clauses_to_partkey is
checking that.
+ else if (IsA(clause, ScalarArrayOpExpr))
I'm not sure what to do with a multidimentional ArrayExpr but
->multidims is checked some places.
+ ParseState *pstate = make_parsestate(NULL);
make_parsestate mandates for the caller to free it by
free_parsestate(). It doesn't seem to leak anything in the
context and I saw the same thing at other places but it would be
better to follow it if possible, or add some apology as a
comment.. (or update the comment of make_parsestate?)
+ * If the leftarg_const and rightarg_consr are both of the type expected
rightarg_consr -> const
+ if (partition_cmp_args(partkey, partattoff,
+ le, lt, le,
+ &test_result))
+ if (partition_cmp_args(partkey, partattoff, ge, gt, ge,
+ &test_result))
Please unify the style.
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
I noticed that bare true and false are not accepted by the
values list of create table syntax. This is not a comment on
this patch but is that intentional?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Hi Jesper.
On 2017/11/28 3:30, Jesper Pedersen wrote:
Hi Amit,
On 11/24/2017 12:00 AM, Amit Langote wrote:
On 2017/11/23 3:56, Jesper Pedersen wrote:
EXPLAIN (ANALYZE) SELECT t1.a, t1.b, t2.c, t2.d FROM t1 INNER JOIN t2 ON
t2.c = t1.b WHERE t2.d = 1;I just wanted to highlight that the "JOIN ON" partition isn't pruned - the
"WHERE" one is.Did you mean to write ON t2.d = t1.b? If so, equivalence class mechanism
will give rise to a t1.b = 1 and hence help prune t1's partition as well:No, I meant 't2.c = t1.b'. If you take the same example, but don't
partition you will get the following plan:test=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.b, t2.c, t2.d FROM t1 INNER
JOIN t2 ON t2.c = t1.b WHERE t2.d = 1;
QUERY PLAN
----------------------------------------------
Nested Loop
-> Index Scan using idx_t2_d on t2
Index Cond: (d = 1)
-> Index Only Scan using idx_t1_b_a on t1
Index Cond: (b = t2.c)
(5 rows)
So it appears to me that you're pointing out the inner Index Only Scan on
t1, which is lot better than scanning all of t1 on every loop iteration.
As you might know, we can't exactly have the index scan on partitioned
table (that is, the parent table itself), because there wouldn't be any
index on it. However, the planner is smart enough to push the clause down
to partitions (leaf tables) which may have the index and hence index scan
could be chosen for them. But note that planner will have chosen *all*
partitions, because there is no constant value to prune partitions with at
that point.
If we get run-time pruning [1]https://commitfest.postgresql.org/15/1330/, we get to get almost close to what happens
in the non-partitioned case. In this case, since t1.b of t2.c = t1.b is
the partition key of t1, we will make an Append node with run-time pruning
enabled. On every loop iteration, t2.c's value will be used to prune
useless partitions, which will leave us in most cases to scan just one
partition and it might be an Index Only Scan using the partition's index.
Maybe "5.10.2. Declarative Partitioning" could be expanded to include some
general "guidelines" of where partition based plans should be checked
against their non-partition counterparts (at least the first bullet in
5.10.1 says ".. in certain situations .."). Probably a separate patch from
this.
I agree about shedding more light on that in the documentation. I will
try to write up a patch someday.
Thanks,
Amit
On Wed, Nov 22, 2017 at 3:59 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
It seems I wrote an Assert in the code to support hash partitioning that
wasn't based on a valid assumption. I was wrongly assuming that all hash
partitions for a given modulus (largest modulus) must exist at any given
time, but that isn't the case.
Committed 0003 with some adjustments:
* Renamed the new test to partition_prune.
* Moved the test to what I thought was a better place in the schedule
file, and made it consistent between serial_schedule and
parallel_schedule.
* commutates -> commuted
* removed wrong /* empty */ comment
* Updated expected output. It surprised me a bit that the tests
weren't passing as you had them, but the differences I got - all
related to mc3p_default - seemed correct to me
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Nov 29, 2017 at 3:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 22, 2017 at 3:59 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:It seems I wrote an Assert in the code to support hash partitioning that
wasn't based on a valid assumption. I was wrongly assuming that all hash
partitions for a given modulus (largest modulus) must exist at any given
time, but that isn't the case.Committed 0003 with some adjustments:
* Renamed the new test to partition_prune.
* Moved the test to what I thought was a better place in the schedule
file, and made it consistent between serial_schedule and
parallel_schedule.
* commutates -> commuted
* removed wrong /* empty */ comment
* Updated expected output. It surprised me a bit that the tests
weren't passing as you had them, but the differences I got - all
related to mc3p_default - seemed correct to me
Committed 0004 after reviewing the code and testing that it seems to
work as advertised.
0005 looks like it might need to be split into smaller patches. More
broadly, the commit messages you wrote for for 0005, 0006, and 0008
don't seem to me to do a great job explaining the motivation for the
changes which they make. They tell me what the patches do, but not
why they are doing it. If there's an email in this thread that
explains that stuff, please point me to it and I'll go back and reread
it more carefully; if not, I think I definitely need some more
explanation both of the mission of each patch and the reason why the
patch set is divided up in the way that it is.
Thanks,
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2017/11/30 5:28, Robert Haas wrote:
On Wed, Nov 22, 2017 at 3:59 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:It seems I wrote an Assert in the code to support hash partitioning that
wasn't based on a valid assumption. I was wrongly assuming that all hash
partitions for a given modulus (largest modulus) must exist at any given
time, but that isn't the case.Committed 0003 with some adjustments:
* Renamed the new test to partition_prune.
* Moved the test to what I thought was a better place in the schedule
file, and made it consistent between serial_schedule and
parallel_schedule.
* commutates -> commuted
* removed wrong /* empty */ comment
Thanks a lot.
* Updated expected output. It surprised me a bit that the tests
weren't passing as you had them, but the differences I got - all
related to mc3p_default - seemed correct to me
Yeah, that one I too noticed yesterday while rebasing.
Thanks,
Amit
On 2017/11/30 7:15, Robert Haas wrote:
On Wed, Nov 29, 2017 at 3:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 22, 2017 at 3:59 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:It seems I wrote an Assert in the code to support hash partitioning that
wasn't based on a valid assumption. I was wrongly assuming that all hash
partitions for a given modulus (largest modulus) must exist at any given
time, but that isn't the case.Committed 0003 with some adjustments:
* Renamed the new test to partition_prune.
* Moved the test to what I thought was a better place in the schedule
file, and made it consistent between serial_schedule and
parallel_schedule.
* commutates -> commuted
* removed wrong /* empty */ comment
* Updated expected output. It surprised me a bit that the tests
weren't passing as you had them, but the differences I got - all
related to mc3p_default - seemed correct to meCommitted 0004 after reviewing the code and testing that it seems to
work as advertised.
Thank you.
0005 looks like it might need to be split into smaller patches. More
broadly, the commit messages you wrote for for 0005, 0006, and 0008
don't seem to me to do a great job explaining the motivation for the
changes which they make. They tell me what the patches do, but not
why they are doing it. If there's an email in this thread that
explains that stuff, please point me to it and I'll go back and reread
it more carefully; if not, I think I definitely need some more
explanation both of the mission of each patch and the reason why the
patch set is divided up in the way that it is.
I'm working on a revised version of these patches to address recent
comments by Horiguchi-san. I will also consider the points above before
sending the new version.
Thanks,
Amit
On Thu, Nov 30, 2017 at 10:43 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I'm working on a revised version of these patches to address recent
comments by Horiguchi-san. I will also consider the points above before
sending the new version.
Ok, this is fresh news, so I am moving this entry to next CF with
waiting on author as status.
--
Michael
On 2017/11/30 11:18, Michael Paquier wrote:
On Thu, Nov 30, 2017 at 10:43 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I'm working on a revised version of these patches to address recent
comments by Horiguchi-san. I will also consider the points above before
sending the new version.Ok, this is fresh news, so I am moving this entry to next CF with
waiting on author as status.
That's correct, thanks.
Regards,
Amit
On 30 November 2017 at 11:15, Robert Haas <robertmhaas@gmail.com> wrote:
Committed 0004 after reviewing the code and testing that it seems to
work as advertised.0005 looks like it might need to be split into smaller patches. More
broadly, the commit messages you wrote for for 0005, 0006, and 0008
don't seem to me to do a great job explaining the motivation for the
changes which they make. They tell me what the patches do, but not
why they are doing it. If there's an email in this thread that
explains that stuff, please point me to it and I'll go back and reread
it more carefully; if not, I think I definitely need some more
explanation both of the mission of each patch and the reason why the
patch set is divided up in the way that it is.
Hi Amit,
It looks like just 0005 to 0008 remain of this and I see that the v13
0005 patch no longer applies to current master.
Are you working on splitting this up as requested by Robert above?
I can continue reviewing this once patches are available that apply to
current master.
Many thanks
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
On 2017/12/07 19:48, David Rowley wrote:
On 30 November 2017 at 11:15, Robert Haas <robertmhaas@gmail.com> wrote:
Committed 0004 after reviewing the code and testing that it seems to
work as advertised.0005 looks like it might need to be split into smaller patches. More
broadly, the commit messages you wrote for for 0005, 0006, and 0008
don't seem to me to do a great job explaining the motivation for the
changes which they make. They tell me what the patches do, but not
why they are doing it. If there's an email in this thread that
explains that stuff, please point me to it and I'll go back and reread
it more carefully; if not, I think I definitely need some more
explanation both of the mission of each patch and the reason why the
patch set is divided up in the way that it is.Hi Amit,
It looks like just 0005 to 0008 remain of this and I see that the v13
0005 patch no longer applies to current master.Are you working on splitting this up as requested by Robert above?
I can continue reviewing this once patches are available that apply to
current master.
I'm still working on that. I will be able to submit a new version
sometime early in the next week, that is, if I don't manage to do it by
today evening (Japan time). Sorry that it's taking a bit longer.
Thanks,
Amit
Horiguchi-san,
Thanks a lot for the review and sorry it took me a while to reply.
On 2017/11/28 20:39, Kyotaro HORIGUCHI wrote:
At Wed, 22 Nov 2017 17:59:48 +0900, Amit Langote wrote:
0001 and 0002 are under discussion with Robert in another thread.
And now committed [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7b88d63a9122646ead60c1afffc248a31d4e457d.
I don't have a comment on 0003, 0004.
0005:
get_partitions_from_clauses is written as _using_ in it's
comment. (also get_partitions_from_clauses_recurse is _guts in
its comment.)
Fixed both.
get_append_rel_partitions just returns NIL if constfalse. I
suppose we'd better reducing indentation level
here. get_partitions_from_clauses_recurse in 0006 does the same
thing.
Less indentation sounds good to me to, so fixed.
In the same function, there's a else clause separated from then
clause by a multiline comment. It seems better that the else
clause has braces and the comment is in the braces like the
following.else
{
/*
* Else there are no clauses....
*/
partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
}
Done that way.
0006:
In get_partitions_from_clauses_recurse, the following comment
seems confusing.+ /* + * The analysis of the matched clauses done by + * classify_partition_bounding_keys may have found mutually contradictory + * clauses. + */constfalse = true also when the cluase was just one false pseudo
constant restrictinfo.
Updated the comment like this:
/*
* classify_partition_bounding_keys() may have found clauses marked
* pseudo-constant that are false that the planner didn't or it may have
* itself found contradictions among clauses.
*/
+ if (!constfalse) + { + /* + * If all clauses in the list were OR clauses, + * classify_partition_bounding_keys() wouldn't have formed keys + * yet. They will be handled below by recursively calling this + * function for each of OR clauses' arguments and combining the + * resulting partition sets appropriately. + */ + if (nkeys > 0)classify_p_b_keys() to return zero also when is not only all-OR
clauses(all AND clauses with volatile function also returns
zero).
Hmm, if all AND clauses contained volatile functions, then planner
wouldn't have called here at all. Also, any clauses it passes to
get_partitions_from_clauses() are known to be OK to use for pruning.
+ /* Set partexpr if needed. */
+ if (partattno == 0)Could you add a description about the meaning of 0 to the
comment of PartitionKeyData something like this?
Sure, done.
| AttrNumber *partattrs; /* attribute numbers of columns in the
| * partition key. 0 means partexpression */+ #define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \ + ((IsA((expr), Var) &&\ + ((Var *) (expr))->varattno == (partattno)) ||\ + equal((expr), (partexpr))) ... + if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))partattno = 0 has a different meaning than ordinary attnos.
I belive that the leftop cannot be a whole row var, but I
suppose we should make it clear. Likewise, even it doesn't
actually happen but it formally has a chance to make a false
match since partexpr is not cleared when partattno > 0.
EXPR_MATCHES_PARTKEY might be better be something like follows.| #define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
| ((partattno) != 0 ? \
| (IsA((expr), Var) && ((Var *) (expr))->varattno == (partattno)) :\
| equal((expr), (partexpr)))
That's better, fixed.
+ if (!op_in_opfamily(opclause->opno, partopfamily)) + { ... + negator = get_negator(opclause->opno); + if (OidIsValid(negator) && + op_in_opfamily(negator, partopfamily)) + { + get_op_opfamily_properties(negator, partopfamily, + false, + &strategy, + &lefttype, &righttype); + if (strategy == BTEqualStrategyNumber)This seems to me to be a bit too much relying on the specific
relationship of the access methods' property. Isn't it
reasonable that add checking that partkey->strategy != 'h'
before getting negator?+ commuted->opno = get_commutator(opclause->opno);
Im afraid that get_commutator can return InvalidOid for
user-defined types or by user-defined operator class or perhaps
other reasons uncertain to me. match_clauses_to_partkey is
checking that.+ else if (IsA(clause, ScalarArrayOpExpr))
I'm not sure what to do with a multidimentional ArrayExpr but
->multidims is checked some places.
In another thread that I recently started [2]/messages/by-id/7677.1512743642@sss.pgh.pa.us, Tom seemed to point out
that ScalarArrayOpExpr cannot really be used for comparing an array on LHS
with, say, an array of arrays on RHS. The LHS expression must yield a
scalar value which is compared with individual scalar values in the array
on RHS. The syntax allows arbitrarily multi-dimensional array to be
specified on RHS, but ultimately only looks at the scalar therein. For
example:
select 1 = any (array[array[array[1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7b88d63a9122646ead60c1afffc248a31d4e457d]]);
?column?
|---------
t
(1 row)
select 1 = any ('{{{1}}}');
?column?
|---------
t
(1 row)
select 1 = any ('{{{2}}}');
?column?
|---------
f
(1 row)
But for the code in question on which you seemed to have commented, it
only works if the array is presented as a Const that can be deconstructed
to a list of scalar values using deconstruct_array(). If the array was
presented as an ArrayExpr, and its multidims is true, the elements list
cannot simply be assumed to contain Const nodes holding scalar values.
So, we should prohibit that case from proceeding. Although, I haven't
been able to frame a test query that results in such a case.
+ ParseState *pstate = make_parsestate(NULL);
make_parsestate mandates for the caller to free it by
free_parsestate(). It doesn't seem to leak anything in the
context and I saw the same thing at other places but it would be
better to follow it if possible, or add some apology as a
comment.. (or update the comment of make_parsestate?)
Added a free_parsestate() call.
+ * If the leftarg_const and rightarg_consr are both of the type expected
rightarg_consr -> const
Fixed.
+ if (partition_cmp_args(partkey, partattoff, + le, lt, le, + &test_result))+ if (partition_cmp_args(partkey, partattoff, ge, gt, ge, + &test_result))Please unify the style.
Fixed.
+ * Boolean conditions have a special shape, which would've been + * accepted if the partitioning opfamily accepts Boolean + * conditions.I noticed that bare true and false are not accepted by the
values list of create table syntax. This is not a comment on
this patch but is that intentional?
Hmm, I guess the original partitioning patch forgot to accept TRUE_P,
FALSE_P as valid partition bound datums along with Sconst, NumericOnly,
and NULL_P. Sent a patch for that in a separate thread [3]/messages/by-id/e05c5162-1103-7e37-d1ab-6de3e0afaf70@lab.ntt.co.jp.
Attached updated patches. As Robert commented [4]/messages/by-id/CA+TgmoYYrPA21e0y5w2NW2-sbANFR4n2nbrSWEWjzvaa_GNi0g@mail.gmail.com, I tried to re-arrange
the patches after breaking down the planner patch ("0005 looks like it
might need to be split into smaller patches"). Brief description of each
follows:
[PATCH 1/5] Some interface changes for partition_bound_{cmp/bsearch}
As the name says, it's a preparatory patch to enable the next patch to use
the partition bound binary search function using partial keys. Until now,
callers needed to specify the whole key (specify values for all partition
columns), but a query may not have clauses on all partition columns, so we
should be able to compare such incomplete keys against partition bounds.
[PATCH 2/5] Introduce a get_partitions_from_clauses()
This used to be a last patch in the previous versions. But I moved it
ahead in the list as basic functionality that needs to be in place before
starting to modify the planner to use the same for faster pruning.
To summarize, just like the existing get_partition_for_tuple() that
receives a tuple from ExecFindPartition() and returns the index of the
partition that should contain that tuple, get_partitions_from_clauses()
will receive a set of clauses that are all known to match some partition
key and derive from it and return the set of indexes of partitions that
satisfy all those clauses.
[PATCH 3/5] Move some code of set_append_rel_size to separate
function
First in the series that modifies the planner. Just a preparatory patch
that moves some code.
[PATCH 4/5] More refactoring around partitioned table AppendPath
creation
Another refactoring patch that changes how we manipulate the partitions
when generating AppendPath for partitioned tables.
Actually, there is one behavior change here - partitioned_rels in Append
today carry the RT indexes of even the partitioned child tables that have
been pruned. Patch modifies things so that that doesn't happen anymore.
[PATCH 5/5] Teach planner to use get_partitions_from_clauses()
With this patch, we finally hit the new faster pruning functionality.
Partitions will be pruned even before looking at the individual partition
RelOptInfos. In fact, set_append_rel_size() now only ever looks at
non-pruned partitions.
Since, we can call get_partitions_from_clauses() from only the SELECT
planning code path, UPDATE/DELETE cases still rely on constraint
exclusion. So, we retrieve the partition constraint only in those cases.
Thanks,
Amit
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7b88d63a9122646ead60c1afffc248a31d4e457d
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7b88d63a9122646ead60c1afffc248a31d4e457d
[2]: /messages/by-id/7677.1512743642@sss.pgh.pa.us
[3]: /messages/by-id/e05c5162-1103-7e37-d1ab-6de3e0afaf70@lab.ntt.co.jp
/messages/by-id/e05c5162-1103-7e37-d1ab-6de3e0afaf70@lab.ntt.co.jp
[4]: /messages/by-id/CA+TgmoYYrPA21e0y5w2NW2-sbANFR4n2nbrSWEWjzvaa_GNi0g@mail.gmail.com
/messages/by-id/CA+TgmoYYrPA21e0y5w2NW2-sbANFR4n2nbrSWEWjzvaa_GNi0g@mail.gmail.com
Attachments:
0001-Some-interface-changes-for-partition_bound_-cmp-bsea-v14.patchtext/plain; charset=UTF-8; name=0001-Some-interface-changes-for-partition_bound_-cmp-bsea-v14.patchDownload
From 98b6786481b689d26092a24170a8f70a49cd052b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 1/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 164 ++++++++++++++++++++++++++++++----------
1 file changed, 122 insertions(+), 42 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index ef156e449e..06b8f6ed7d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -984,6 +1010,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -998,8 +1026,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1072,10 +1106,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1126,6 +1166,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1147,8 +1188,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1162,9 +1206,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2533,12 +2577,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2565,11 +2612,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2777,12 +2828,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2804,11 +2855,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2817,25 +2868,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2846,12 +2927,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2865,20 +2947,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2891,8 +2972,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0002-Introduce-a-get_partitions_from_clauses-v14.patchtext/plain; charset=UTF-8; name=0002-Introduce-a-get_partitions_from_clauses-v14.patchDownload
From 29e1a42cbba273373f17aa34f2b0ee1619a9e19c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of query clauses and
return a set of indexes of the partitions that satisfy all of those
clauses.
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Callers must have checked that each of the clauses matches one of the
partition keys.
---
src/backend/catalog/partition.c | 1670 ++++++++++++++++++++++++++++++++++
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/optimizer/clauses.h | 2 +
5 files changed, 1679 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 06b8f6ed7d..7e3a777695 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -163,6 +167,69 @@ typedef struct PartitionBoundCmpArg
int ndatums;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ *
+ * Equal keys are not required to be in any particular order, unlike the
+ * keys below which must appear in the same order as partition keys.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Does the query specify a key to be null or not null? Partitioning
+ * handles null partition keys specially depending on the partitioning
+ * method in use, we store this information.
+ */
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +278,25 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static int32 partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1577,9 +1663,1593 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_from_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ Bitmapset *result;
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ Assert(partclauses != NIL);
+
+ /*
+ * If relation is a partition itself, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partconstr)
+ {
+ PartitionBoundInfo boundinfo =
+ RelationGetPartitionDesc(relation)->boundinfo;
+
+ /*
+ * We need to worry about such a case only if the relation has a
+ * default partition to begin with.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ }
+ }
+
+ result = get_partitions_from_clauses_recurse(relation, rt_index,
+ partclauses);
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_from_clauses_recurse
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ /*
+ * Reduce the set of clauses into a form that get_partitions_for_keys()
+ * can work with.
+ */
+ nkeys = classify_partition_bounding_keys(relation, clauses, rt_index,
+ &keys, &constfalse,
+ &or_clauses);
+
+ /*
+ * classify_partition_bounding_keys() may have found clauses marked
+ * pseudo-constant that are false that the planner didn't or it may have
+ * itself found contradictions among clauses.
+ */
+ if (constfalse)
+ return NULL;
+
+ /*
+ * If all clauses in the list were OR clauses,
+ * classify_partition_bounding_keys() wouldn't have formed keys yet. They
+ * will be handled below by recursively calling this function for each of
+ * OR clauses' arguments and combining the resulting partition sets
+ * appropriately.
+ */
+ if (nkeys > 0)
+ result = get_partitions_for_keys(relation, &keys);
+ else
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ /* No point in trying to look at other conjunctive clauses. */
+ if (bms_is_empty(result))
+ return NULL;
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ Bitmapset *or_partset = NULL;
+
+ foreach(lc1, or->args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc1));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation
+ * due to the latter's partition constraint, which means we must
+ * not add its partitions to or_partset. But the clause may not
+ * contain this relation's partition key expressions (instead the
+ * parent's), so we could not depend on just calling
+ * get_partitions_from_clauses_recurse(relation, ...) to determine
+ * that the clause indeed prunes all of the relation's partition.
+ *
+ * Use predicate refutation proof instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation,
+ rt_index,
+ arg_clauses);
+
+ /*
+ * Partition sets obtained from mutually-disjunctive clauses are
+ * combined using set union.
+ */
+ or_partset = bms_union(or_partset, arg_partset);
+ }
+
+ /*
+ * Partition sets obtained from mutually-conjunctive clauses are
+ * combined using set intersection.
+ */
+ result = bms_intersect(result, or_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ?\
+ (IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) :\
+ equal((expr), (partexpr)))
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, and max keys, along with
+ * any Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max
+ * bounds. For example, of a > 1, a > 2, and a >= 5, "5" is the best min
+ * bound for the column a, which also happens to be an inclusive bound.
+ * When analyzing multiple clauses referencing the same key, it is checked
+ * if there are mutually contradictory clauses and if so, we set *constfalse
+ * to true to indicate to the caller that the set of clauses cannot be true
+ * for any partition. It is also set if the list already contains a
+ * pseudo-constant clause.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by clauses containing equality operator, unless hash
+ * partitioning is in use, in which case, it's possible that some keys have
+ * IS NULL clauses while remaining have clauses with equality operator.
+ * Min and max bounds could contain bound values for only a prefix of keys.
+ *
+ * All the OR clauses encountered in the list and those generated from certain
+ * ScalarArrayOpExprs are added to *or_clauses. It's the responsibility of the
+ * caller to process the argument clauses of each of the OR clauses, which
+ * would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_or_clauses = true;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, sizeof(keyclauses_all));
+ /* false means we don't know if a given key is null */
+ memset(keyisnull, false, sizeof(keyisnull));
+ /* false means we don't know if a given key is not null */
+ memset(keyisnotnull, false, sizeof(keyisnull));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i],
+ partcoll = partkey->partcollation[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ /*
+ * A non-zero partattno refers to a simple column reference that
+ * will be matched against varattno of a Var appearing the clause.
+ * partattno == 0 refers to arbirtary expressions, which get the
+ * current one from PartitionKey.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ /* Copy to avoid overwriting the relcache's content. */
+ partexpr = copyObject(lfirst(partexprs_item));
+
+ /*
+ * Expressions stored in PartitionKey in the relcache all
+ * contain a dummy varno (that is, 1), but we must switch to
+ * the RT index of the table in this query so that it can be
+ * correctly matched to the expressions coming from the query.
+ */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ constexpr = leftop;
+ else
+ /* Clause not meant for this column. */
+ continue;
+
+ /*
+ * Handle some cases wherein the clause's operator may not
+ * belong to the partitioning operator family. For example,
+ * operators named '<>' are not listed in any operator
+ * family whatsoever. Also, ordering opertors like '<' are
+ * not listed in the hash operator family.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Expr *ltexpr,
+ *gtexpr;
+ Oid negator,
+ ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is '<>', check if its
+ * negator is an equality operator. If so and it's a btree
+ * equality operator, we can use a special trick to prune
+ * partitions that won't satisfy the original '<>'
+ * operator -- we generate an OR expression
+ * 'leftop < rightop OR leftop > rightop' and add it to
+ * *or_clauses.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ {
+ Expr *or;
+
+ ltop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop,
+ (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop,
+ (Expr *) rightop,
+ InvalidOid, partcoll);
+ or = makeBoolExpr(OR_EXPR,
+ list_make2(ltexpr, gtexpr), -1);
+ *or_clauses = lappend(*or_clauses, or);
+ continue;
+ }
+ }
+
+ /*
+ * Getting here means opclause uses an ordering op and
+ * hash partitioning is in use. We shouldn't try to
+ * reason about such an operator for the purposes of
+ * partition pruning, because hash partitioning doesn't
+ * make partitioning decisions based on relative ordering
+ * of keys.
+ */
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * Flip the left and right args if we have to, because the
+ * code which extract the constant value to use for
+ * partition-pruning expects to find it as the rightop of the
+ * clause. (See below in this function.)
+ */
+ if (constexpr == rightop)
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = get_commutator(opclause->opno);
+ commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ only_or_clauses = false;
+
+ /*
+ * Since we only allow strict operators, require keys to be
+ * not null.
+ */
+ keyisnotnull[i] = true;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) &&
+ ((Var *) arg)->varattno == partattno) ||
+ equal(arg, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ keyisnull[i] = true;
+ else
+ keyisnotnull[i] = true;
+ n_keynullness++;
+ only_or_clauses = false;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ only_or_clauses = false;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_or_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Try to eliminate redundant keys. In the process, we might find out
+ * that clauses are mutually contradictory and hence can never be true
+ * for any rows.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i], &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ int32 op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ if (op_strategy < 0 &&
+ need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ else if (op_strategy == 0)
+ {
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ }
+ else if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found the same for partition key columns.
+ * If present, we don't need minkeys and maxkeys anymore. In the case
+ * of hash partitioning, we don't require all equal keys to be operator
+ * clauses. For hash partitioning, any IS NULL clauses are considered
+ * as equal keys by the code performing actual pruning, at which time it
+ * is checked whether, along with any operator clauses, all partition key
+ * columns are covered.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ keys->keyisnull[i] = keyisnull[i];
+ keys->keyisnotnull[i] = keyisnotnull[i];
+ }
+
+ return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
+}
+
+/*
+ * Returns -1, 0, or 1 to signify that the partitioning clause has a </<=,
+ * =, and >/>= operator, respectively. Sets *incl to true if equality is
+ * implied.
+ */
+static int32
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ int32 result;
+
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = 0;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ result = -1;
+ *incl = (op->op_strategy == BTLessEqualStrategyNumber);
+ break;
+ case BTEqualStrategyNumber:
+ result = 0;
+ *incl = true;
+ break;
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ result = 1;
+ *incl = (op->op_strategy == BTGreaterEqualStrategyNumber);
+ break;
+ }
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partattoff])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partattoff], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If couldn't coerce to the partition key type, that is, the type of
+ * datums stored in PartitionBoundInfo, no hope of using this
+ * expression for anything partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+/*
+ * For a given partition key column, find the most restrictive of the clauses
+ * contained in all_clauses that are known to match the column. If in the
+ * process, it is found that two clauses are mutually contradictory, we simply
+ * stop, set *constfalse to true, and return.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey, int partattoff,
+ List *all_clauses, List **result,
+ bool *constfalse)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ hash_clause = NULL;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[partattoff],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've matched
+ * a clause and found another whose constant operand doesn't match
+ * the constant operand of the former, we have a case of mutually
+ * contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, partattoff,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value and
+ * so add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with the same. It's possible that mutual
+ * contradiction is proved at some higher level, but it's just
+ * that we couldn't do so here.
+ */
+ else
+ *result = lappend(*result, cur);
+
+ /* The code below is for btree operators, which cur is not. */
+ continue;
+ }
+
+ /*
+ * Stuff that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points to the currently best scan key of strategy
+ * type s+1; it is NULL if we haven't yet found such a key for this
+ * attr.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, replace old key. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+
+ /* The old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ *result = lappend(*result, hash_clause);
+ return;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equal key with keys of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq key is
+ * a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq key is a = 3, then because 3 < 5, we no longer need a < 5,
+ * because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the result.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ if (btree_clauses[s])
+ *result = lappend(*result, btree_clauses[s]);
+}
+
+/*
+ * Evaluate 'leftarg op rightarg' and set *result to its value.
+ *
+ * leftarg and rightarg referred to above actually refer to the constant
+ * operand (Datum) of the clause contained in the parameters leftarg and
+ * rightarg below, respectively. And op refers to the operator of the
+ * clause contained in the parameter op below.
+ *
+ * Returns true if we could actually perform the evaluation. False is
+ * returned otherwise, that is, in cases where we couldn't perform the
+ * evaluation for reasons such as operands values being unavailable or
+ * types of operands being incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partattoff];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partattoff,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partattoff,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg_const and rightarg_const are both of the type expected
+ * by op's operator, then compare them using the latter.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ Bitmapset *result = NULL;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool hash_isnull[PARTITION_MAX_KEYS];
+
+ /* Return an empty set if no partitions to see. */
+ if (partdesc->nparts == 0)
+ return NULL;
+
+ memset(hash_isnull, false, sizeof(hash_isnull));
+ /* Handle null partition keys. */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keyisnull[i])
+ {
+ int other_idx = -1;
+
+ switch (partkey->strategy)
+ {
+ /*
+ * Hash partitioning handles puts nulls into a normal
+ * partition and doesn't require to define a special
+ * null-accpting partition. So, we let this fall through
+ * get handled by the code below that handles equality
+ * keys.
+ */
+ case PARTITION_STRATEGY_HASH:
+ hash_isnull[i] = true;
+ keys->n_eqkeys++;
+ break;
+
+ /*
+ * In range and list partitioning cases, only a designated
+ * partition will accept nulls.
+ */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ if (partition_bound_accepts_nulls(boundinfo)||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ result = bms_make_singleton(other_idx);
+ return result;
+ }
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exists a
+ * partition, the latter must be a partition that accepts only nulls
+ * or a default partition. If it is the former and we didn't already
+ * return it as the only scannable partition, that means the query
+ * doesn't want null values in its output. So, all of what the query
+ * wants instead must be in the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(boundinfo->default_index);
+ return result;
+ }
+
+
+ /*
+ * Determine set of partitions using provided keys, which proceeds in a
+ * manner determined by the partitioning method.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ Assert(keys->n_eqkeys == partkey->partnatts);
+ switch (partkey->strategy)
+ {
+ /* Hash-partitioning is real simple. */
+ case PARTITION_STRATEGY_HASH:
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys,
+ hash_isnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ result = bms_make_singleton(result_index);
+
+ return result;
+ }
+
+ /* Range and list partitioning take a bit more work. */
+
+ case PARTITION_STRATEGY_LIST:
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+ /* For list partition, must exactly match the datum. */
+ if (eqoff >= 0 && !is_equal)
+ eqoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+ /*
+ * eqoff is gives us the bound that is known to be <=
+ * eqkeys given how partition_bound_bsearch works. The
+ * bound at eqoff + 1, then, would be the upper bound of
+ * the only partition that needs to be scanned.
+ */
+ if (eqoff >= 0)
+ eqoff += 1;
+ break;
+ }
+
+ /*
+ * Ask later code to include the default partition, because eqkeys
+ * didn't identify a specific partition or identified a range
+ * of unassigned values.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff] >= 0)
+ result = bms_make_singleton(boundinfo->indexes[eqoff]);
+ else if (partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(boundinfo->default_index);
+
+ /* There are no minkeys and maxkeys when eqkeys is valid. */
+ return result;
+ }
+
+ /*
+ * Hash partitioning doesn't understand non-equality conditions, so
+ * return all partitions.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ return result;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case,
+ * set minoff to the index of the leftmost datum, viz. 0.
+ *
+ * If the bound at minoff doesn't exactly match minkey or if
+ * it does but minkey isn't inclusive, move to the bound on
+ * the right.
+ */
+ if (minoff == -1 || !is_equal || !keys->min_incl)
+ minoff++;
+
+ /*
+ * boundinfo->ndatums - 1 is the last valid list partition datums
+ * index.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ minoff = -1;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If only a prefix of the whole partition key is provided,
+ * there will be multiple partitions whose bound share the
+ * same prefix. If minkey is inclusive, we must make minoff
+ * point to the leftmost such bound, making the result contain
+ * all such partitions. If it is exclusive, we must move
+ * minoff to the right such that minoff points to the first
+ * partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in
+ * the result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ minoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is
+ * known to be <= query's minkey. The bound at minoff + 1 (if
+ * there is one), then, would be the upper bound of the
+ * leftmost partition that needs to be scanned.
+ */
+ minoff += 1;
+ break;
+ }
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_RANGE)
+ /* 1 more index than datums in this case */
+ maxoff = boundinfo->ndatums;
+ else
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * Unlike minoff, we leave maxoff that is set to -1 unchanged,
+ * because it simply means none of the partitions satisfies
+ * maxkeys.
+ *
+ * If the bound at maxoff exactly matches maxkey (is_equal),
+ * but the maxkey is not inclusive, then go to the bound on
+ * left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+
+ /*
+ * maxoff may have become -1, which again means no partition
+ * satisfies the maxkeys.
+ */
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ maxoff, &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is
+ * known to be <= query's maxkey. The bound at maxoff + 1,
+ * then, would be the upper bound of the rightmost partition
+ * that needs to be scanned. Although, if the bound is equal
+ * to maxkeys and the latter is not inclusive, then the bound
+ * at maxoff itself is the upper bound of the rightmost
+ * partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+
+ break;
+ }
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool list_include_def = false,
+ range_include_def = false;
+
+ switch (partkey->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * All datums between those at minoff and maxoff satisfy the
+ * query keys, so add the corresponding partitions to the
+ * result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list
+ * partition. Because list partitions divide the key space
+ * in a discontinuous manner, not all values in the given
+ * range will have a partition assigned.
+ */
+ list_include_def = true;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper
+ * bound of an unassigned range of values, move to the
+ * adjacent bound which must be the upper bound of the
+ * leftmost or rightmost partition, respectively, that needs
+ * to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do
+ * indeed satisfy the query, but don't have a valid partition
+ * assigned. The default partition would've been included to
+ * cover those values. Although, if the original bound in
+ * question is an infinite value, there would not be any
+ * unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the
+ * default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ {
+ range_include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There might exist a range of values unassigned to any
+ * non-default range partition between the datums at
+ * minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys will nulls are mapped to default
+ * range partition, we must include the default partition
+ * if certain keys could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!keys->keyisnotnull[i])
+ {
+ range_include_def = true;
+ break;
+ }
+ }
+ }
+
+ break;
+ }
+
+ if ((list_include_def || range_include_def) &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6a2d5ad760..ce83fbcb22 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -149,8 +149,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4745,7 +4743,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2983cfa217..7a5ab45c5c 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -71,4 +71,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 0d0ba7c66a..f2fddeceb8 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -187,4 +187,7 @@ DATA(insert OID = 4082 ( 3580 pg_lsn_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index e3672218f3..1ef13a49de 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
0003-Move-some-code-of-set_append_rel_size-to-separate-fu-v14.patchtext/plain; charset=UTF-8; name=0003-Move-some-code-of-set_append_rel_size-to-separate-fu-v14.patchDownload
From a708903b50a49f5dba2fc9029abb2da28fa866c1 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 47986ba80a..bba6d09091 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 674cfc6b06..daa8f516ce 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 99f65b44f2..00c134d5a3 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -299,5 +299,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0004-More-refactoring-around-partitioned-table-AppendPath-v14.patchtext/plain; charset=UTF-8; name=0004-More-refactoring-around-partitioned-table-AppendPath-v14.patchDownload
From 901858f25b2e5faee82ac7ae9ac18e40d5b32f4e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 129 ++++++++++++++++++++++------------
src/backend/optimizer/plan/planner.c | 19 +++--
src/backend/optimizer/util/relnode.c | 10 +++
src/include/nodes/relation.h | 22 +++++-
4 files changed, 128 insertions(+), 52 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index bba6d09091..af4612da44 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,22 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
+ /*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel. Note
+ * that rel (the parent) might just be a union all subquery, in which
+ * case, there is nothing to do here.
+ */
+ if (IS_PARTITIONED_REL(childrel) && IS_PARTITIONED_REL(rel))
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,14 +1220,29 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1267,44 +1316,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1366,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e8bc15c35d..7a09f07b15 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6189,14 +6189,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index daa8f516ce..dcfda1c3cc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +743,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1108b6a0ea..c445f401d9 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,6 +529,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +658,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
0005-Teach-planner-to-use-get_partitions_from_clauses-v14.patchtext/plain; charset=UTF-8; name=0005-Teach-planner-to-use-get_partitions_from_clauses-v14.patchDownload
From e8b926c35a39dc5aed7e074b6c029e258f781e33 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using the information in its
PartitionScheme. Further, if we pass the set of matched clauses to
get_partitions_from_clauses(), we get the set of matching partitions
in (hopefully) less time than determining the same by running the
matching algorithm separately for each partition.
Authors: Amit Langote, Dilip Kumar
---
src/backend/optimizer/path/allpaths.c | 406 +++++++++++++++++++++++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 33 ++-
src/include/nodes/relation.h | 7 +-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition_prune.out | 340 +++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 47 ++-
7 files changed, 786 insertions(+), 79 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index af4612da44..d222effd38 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,8 +20,10 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
@@ -136,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -847,6 +857,397 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * in rel->baserestrictinfo. An empty list is returned if no matching
+ * partitions were found.
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *partclauses;
+ bool contains_const,
+ constfalse;
+ List *result = NIL;
+ int i;
+ Relation parent;
+ PartitionDesc partdesc;
+ Bitmapset *partindexes;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(root, rel,
+ list_copy(rel->baserestrictinfo),
+ &contains_const,
+ &constfalse);
+
+ /* We're done here. */
+ if (constfalse)
+ return NIL;
+
+ parent = heap_open(rte->relid, NoLock);
+ partdesc = RelationGetPartitionDesc(parent);
+
+ /*
+ * If we have matched clauses that contain at least one constant operand,
+ * then use these to prune partitions.
+ */
+ if (partclauses != NIL && contains_const)
+ partindexes = get_partitions_from_clauses(parent, rel->relid,
+ partclauses);
+ else
+ {
+ /*
+ * Else there are no clauses that are useful to prune any paritions,
+ * so we must scan all partitions.
+ */
+ partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause
+ * must be an operator clause of the form (partkey op const) or (const op
+ * partkey); the latter only if a suitable commutator exists. Furthermore,
+ * the operator must be strict and its input collation must match the partition
+ * collation. The aforementioned "const" means any expression that doesn't
+ * involve a volatile function or a Var of this relation. We allow Vars
+ * belonging to other relations (for example, if the clause is a join clause),
+ * but they are treated as parameters whose values are not known now, so cannot
+ * be used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join clauses
+ * appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ *
+ * If clauses contains at least one constant operand or a Nullness test,
+ * *contains_const is set so that the caller can pass the clauses to the
+ * partitioning module right away.
+ *
+ * If the list contains a pseudo-constant RestrictInfo with constant false
+ * value, *constfalse is set.
+ */
+static List *
+match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Node *member = lfirst(lc);
+ Expr *clause;
+ int i;
+
+ if (IsA(member, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) member;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+ else
+ clause = (Expr *) member;
+
+ /*
+ * For a BoolExpr, we should try to match each of its args with the
+ * partition key as described below for each type.
+ */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ /*
+ * For each of OR clause's args, call this function
+ * recursively with a given arg as the only member in the
+ * input list and see if it's returned as matching the
+ * partition key. Add the OR clause to the result iff at
+ * least one of its args contain a matching clause.
+ */
+ BoolExpr *orclause = (BoolExpr *) clause;
+ ListCell *lc1;
+ bool arg_matches_key = false,
+ matched_arg_contains_const = false,
+ all_args_constfalse = true;
+
+ foreach (lc1, orclause->args)
+ {
+ Node *arg = lfirst(lc1);
+ bool contains_const1,
+ constfalse1;
+
+ if (match_clauses_to_partkey(root, rel, list_make1(arg),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ arg_matches_key = true;
+ matched_arg_contains_const = contains_const1;
+ }
+
+ /* We got at least one arg that is not constant false. */
+ if (!constfalse1)
+ all_args_constfalse = false;
+ }
+
+ if (arg_matches_key)
+ {
+ result = lappend(result, clause);
+ *contains_const = matched_arg_contains_const;
+ }
+
+ /* OR clause is "constant false" if all of its args are. */
+ *constfalse = all_args_constfalse;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Since the clause is itself implicitly ANDed with other
+ * clauses in the input list, queue the args to be processed
+ * later as if they were part of the original input list.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the clauses matches the partition key and add it to
+ * the result list if other things such as operator input
+ * collation, strictness, etc. look fine.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning.
+ */
+ result = lappend(result, clause);
+
+ if (!*contains_const)
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* OK to add to the result. */
+ result = lappend(result, clause);
+ if (IsA(estimate_expression_value(root, rightop), Const))
+ *contains_const = true;
+ else
+ *contains_const = false;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* A Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Certain Boolean conditions have a special shape, which we
+ * accept if the partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ /*
+ * Only accept those for pruning that appear to be
+ * IS [NOT] TRUE/FALSE.
+ */
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+ Expr *arg = btest->arg;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN &&
+ equal((Node *) arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var))
+ {
+ if (equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ }
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +1289,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 5e03f8bc21..5bd30312cb 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1397,6 +1397,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f7438714c4..df963f701f 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1248,21 +1247,25 @@ get_relation_constraints(PlannerInfo *root,
}
/* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
+
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1920,6 +1923,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index c445f401d9..bcb669d212 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..2072766efd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..e950cff6d2 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -120,6 +120,8 @@ explain (costs off) select * from lp where a <> 'a' and a <> 'd';
QUERY PLAN
-------------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_ef
@@ -128,12 +130,14 @@ explain (costs off) select * from lp where a <> 'a' and a <> 'd';
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-(9 rows)
+(11 rows)
explain (costs off) select * from lp where a not in ('a', 'd');
QUERY PLAN
------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_bc
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_ef
@@ -142,7 +146,7 @@ explain (costs off) select * from lp where a not in ('a', 'd');
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_default
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-(9 rows)
+(11 rows)
-- collation matches the partitioning collation, pruning works
create table coll_pruning (a text collate "C") partition by list (a);
@@ -208,16 +212,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +521,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +575,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +649,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +657,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +712,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +888,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +900,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +961,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1007,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1030,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1071,253 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- some more cases
+-- pruning for partitioned table appearing inside a sub-query
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..72dae80e8a 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,49 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- some more cases
+
+-- pruning for partitioned table appearing inside a sub-query
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp;
--
2.11.0
On 12 December 2017 at 22:13, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated patches.
Thanks for sending the updated patches.
I don't have a complete review at the moment, but the following code
in set_append_rel_pathlist() should be removed.
/* append_rel_list contains all append rels; ignore others */
if (appinfo->parent_relid != parentRTindex)
continue;
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
On 2017/12/13 18:48, David Rowley wrote:
On 12 December 2017 at 22:13, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Attached updated patches.
Thanks for sending the updated patches.
I don't have a complete review at the moment, but the following code
in set_append_rel_pathlist() should be removed./* append_rel_list contains all append rels; ignore others */
if (appinfo->parent_relid != parentRTindex)
continue;
Will fix that right away, thanks.
Thanks,
Amit
On 12 December 2017 at 22:13, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated patches.
Hi Amit,
I'm sorry to say this is another micro review per code I'm stumbling
over when looking at the run-time partition pruning stuff.
1. In get_partitions_from_clauses_recurse(), since you're assigning
the result to the first input, the following should use
bms_add_members and not bms_union. The logical end result is the same,
but using bms_union means a wasted palloc and a small memory leak
within the memory context.
/*
* Partition sets obtained from mutually-disjunctive clauses are
* combined using set union.
*/
or_partset = bms_union(or_partset, arg_partset);
2. Also in get_partitions_from_clauses_recurse(), it might also be
worth putting in a bms_free(or_partset) after:
/*
* Partition sets obtained from mutually-conjunctive clauses are
* combined using set intersection.
*/
result = bms_intersect(result, or_partset);
Also, instead of using bms_intersect here, would it be better to do:
result = bms_del_members(result, or_partset); ?
That way you don't do a bms_copy and leak member for each OR branch
since bms_intersect also does a bms_copy()
The resulting set could end up with a few more trailing 0 words than
what you have now, but it to be a better idea not allocate a new set
each time.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 19 December 2017 at 17:36, David Rowley <david.rowley@2ndquadrant.com> wrote:
Also, instead of using bms_intersect here, would it be better to do:
result = bms_del_members(result, or_partset); ?
I should have said bms_int_members() rather than bms_del_members()
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 12 December 2017 at 22:13, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated patches.
Hi Amit,
I was also wondering about your thoughts on the design of
get_partitions_for_keys() and more generally how there are many
functions which have some special treatment doing something based on
->strategy == PARTITION_STRATEGY_XXX.
If I do:
git grep PARTITION_STRATEGY -- src/backend/catalog/partition.c | wc -l
I get 62 matches, most of which are case statements, and most of the
remainder are things like if (key->strategy ==
PARTITION_STRATEGY_HASH).
git grep --show-function PARTITION_STRATEGY -- src/backend/catalog/partition.c
shows that get_partitions_for_keys() is probably the most guilty of
having the most strategy condition tests.
Also, if we look at get_partitions_for_keys() there's an unconditional:
memset(hash_isnull, false, sizeof(hash_isnull));
which is only used for PARTITION_STRATEGY_HASH, but LIST and RANGE
must pay the price of that memset. Perhaps it's not expensive enough
to warrant only doing that when partkey->strategy ==
PARTITION_STRATEGY_HASH, but it does make me question if we should
have 3 separate functions for this and just have a case statement to
call the correct one.
I think if we were to put this off as something we'll fix later, then
the job would just become harder and harder as time goes on.
It might have been fine when we just had RANGE and LIST partitioning,
but I think HASH really tips the scales over to this being needed.
What do you think?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 19 December 2017 at 17:36, David Rowley <david.rowley@2ndquadrant.com> wrote:
I'm sorry to say this is another micro review per code I'm stumbling
over when looking at the run-time partition pruning stuff.
Again, another micro review. I apologise for the slow trickle of
review. Again, these are just things I'm noticing while reading
through while thinking of the run-time pruning patch.
1. The following Assert appears to be testing for the presence of
cosmic rays :-)
/*
* Determine set of partitions using provided keys, which proceeds in a
* manner determined by the partitioning method.
*/
if (keys->n_eqkeys == partkey->partnatts)
{
Assert(keys->n_eqkeys == partkey->partnatts);
Perhaps it's misplaced during a rewrite? Should be safe enough to
remove it, I'd say.
2. The following code in classify_partition_bounding_keys() misses
looking under the RelabelType for rightarg:
leftop = (Expr *) get_leftop(clause);
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
rightop = (Expr *) get_rightop(clause);
if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
constexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
constexpr = leftop;
This breaks the following case:
create table thisthat (a varchar not null) partition by list (a);
create table this partition of thisthat for values in('this');
create table that partition of thisthat for values in('that');
explain select * from thisthat where 'this' = a; -- does not work
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..54.00 rows=14 width=32)
-> Seq Scan on that (cost=0.00..27.00 rows=7 width=32)
Filter: ('this'::text = (a)::text)
-> Seq Scan on this (cost=0.00..27.00 rows=7 width=32)
Filter: ('this'::text = (a)::text)
(5 rows)
explain select * from thisthat where a = 'this'; -- works as we look
through the RelabelType on left arg.
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..27.00 rows=7 width=32)
-> Seq Scan on this (cost=0.00..27.00 rows=7 width=32)
Filter: ((a)::text = 'this'::text)
3. The follow code assumes there will be a commutator for the operator:
if (constexpr == rightop)
pc->op = opclause;
else
{
OpExpr *commuted;
commuted = (OpExpr *) copyObject(opclause);
commuted->opno = get_commutator(opclause->opno);
commuted->opfuncid = get_opcode(commuted->opno);
commuted->args = list_make2(rightop, leftop);
pc->op = commuted;
}
I had to hunt for it, but it appears that you're pre-filtering clauses
with the Consts on the left and no valid commutator in
match_clauses_to_partkey. I think it's likely worth putting a comment
to mention that reversed clauses with no commutator should have been
filtered out beforehand. I'd say it's also worthy of an Assert().
4. The spelling of "arbitrary" is incorrect in:
* partattno == 0 refers to arbirtary expressions, which get the
5. I've noticed that partition pruning varies slightly from constraint
exclusion in the following case:
create table ta (a int not null) partition by list (a);
create table ta1 partition of ta for values in(1,2);
create table ta2 partition of ta for values in(3,4);
explain select * from ta where a <> 1 and a <> 2; -- partition ta1 is
not eliminated.
QUERY PLAN
-------------------------------------------------------------
Append (cost=0.00..96.50 rows=5050 width=4)
-> Seq Scan on ta1 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
-> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
(5 rows)
alter table ta1 add constraint ta1_chk check (a in(1,2)); -- add a
check constraint to see if can be removed.
explain select * from ta where a <> 1 and a <> 2; -- it can.
QUERY PLAN
-------------------------------------------------------------
Append (cost=0.00..48.25 rows=2525 width=4)
-> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
(3 rows)
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hello David.
Thanks for the reviews. Replying to all your emails here.
On 2017/12/19 13:36, David Rowley wrote:
On 12 December 2017 at 22:13, Amit Langote wrote:
Attached updated patches.
Hi Amit,
I'm sorry to say this is another micro review per code I'm stumbling
over when looking at the run-time partition pruning stuff.1. In get_partitions_from_clauses_recurse(), since you're assigning
the result to the first input, the following should use
bms_add_members and not bms_union. The logical end result is the same,
but using bms_union means a wasted palloc and a small memory leak
within the memory context./*
* Partition sets obtained from mutually-disjunctive clauses are
* combined using set union.
*/
or_partset = bms_union(or_partset, arg_partset);
Done. Replaced with bms_add_members().
2. Also in get_partitions_from_clauses_recurse(), it might also be
worth putting in a bms_free(or_partset) after:/*
* Partition sets obtained from mutually-conjunctive clauses are
* combined using set intersection.
*/
result = bms_intersect(result, or_partset);
Done, too.
Also, instead of using bms_intersect here, would it be better to do:
result = bms_del_members(result, or_partset); ?
That way you don't do a bms_copy and leak member for each OR branch
since bms_intersect also does a bms_copy()The resulting set could end up with a few more trailing 0 words than
what you have now, but it to be a better idea not allocate a new set
each time.
You meant bms_int_members(), as you also said in your other email.
On 2017/12/19 14:42, David Rowley wrote:
I was also wondering about your thoughts on the design of
get_partitions_for_keys() and more generally how there are many
functions which have some special treatment doing something based on
->strategy == PARTITION_STRATEGY_XXX.If I do:
git grep PARTITION_STRATEGY -- src/backend/catalog/partition.c | wc -l
I get 62 matches, most of which are case statements, and most of the
remainder are things like if (key->strategy ==
PARTITION_STRATEGY_HASH).git grep --show-function PARTITION_STRATEGY -- src/backend/catalog/
partition.cshows that get_partitions_for_keys() is probably the most guilty of
having the most strategy condition tests.
I notice that too now that you mention it.
Also, if we look at get_partitions_for_keys() there's an unconditional:
memset(hash_isnull, false, sizeof(hash_isnull));
which is only used for PARTITION_STRATEGY_HASH, but LIST and RANGE
must pay the price of that memset.
Although I know you're talking about something else here (about which I
say below), turns out this hash_isnull was completely unnecessary, so I
got rid of it.
Perhaps it's not expensive enough
to warrant only doing that when partkey->strategy ==
PARTITION_STRATEGY_HASH, but it does make me question if we should
have 3 separate functions for this and just have a case statement to
call the correct one.I think if we were to put this off as something we'll fix later, then
the job would just become harder and harder as time goes on.It might have been fine when we just had RANGE and LIST partitioning,
but I think HASH really tips the scales over to this being needed.What do you think?
I think I somewhat understand your concern with regard to future additions
and maintenance and also now tend to agree.
I tried dividing up get_partitions_for_keys() into one function each for
hash, list, and range and it looks like in the attached. I think I like
the result. Each function has to deal with only query keys and bounds
assuming a given partitioning method and that appears to add to the
overall clarity of the code.
On 2017/12/19 22:44, David Rowley wrote:
Again, another micro review. I apologise for the slow trickle of
review. Again, these are just things I'm noticing while reading
through while thinking of the run-time pruning patch.1. The following Assert appears to be testing for the presence of
cosmic rays :-)/*
* Determine set of partitions using provided keys, which proceeds in a
* manner determined by the partitioning method.
*/
if (keys->n_eqkeys == partkey->partnatts)
{
Assert(keys->n_eqkeys == partkey->partnatts);Perhaps it's misplaced during a rewrite? Should be safe enough to
remove it, I'd say.
I noticed that too and took care of it. :)
2. The following code in classify_partition_bounding_keys() misses
looking under the RelabelType for rightarg:leftop = (Expr *) get_leftop(clause);
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
rightop = (Expr *) get_rightop(clause);
if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
constexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
constexpr = leftop;This breaks the following case:
create table thisthat (a varchar not null) partition by list (a);
create table this partition of thisthat for values in('this');
create table that partition of thisthat for values in('that');
explain select * from thisthat where 'this' = a; -- does not work
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..54.00 rows=14 width=32)
-> Seq Scan on that (cost=0.00..27.00 rows=7 width=32)
Filter: ('this'::text = (a)::text)
-> Seq Scan on this (cost=0.00..27.00 rows=7 width=32)
Filter: ('this'::text = (a)::text)
(5 rows)explain select * from thisthat where a = 'this'; -- works as we look
through the RelabelType on left arg.
QUERY PLAN
------------------------------------------------------------
Append (cost=0.00..27.00 rows=7 width=32)
-> Seq Scan on this (cost=0.00..27.00 rows=7 width=32)
Filter: ((a)::text = 'this'::text)
Thanks for pointing it out, fixed.
3. The follow code assumes there will be a commutator for the operator:
if (constexpr == rightop)
pc->op = opclause;
else
{
OpExpr *commuted;commuted = (OpExpr *) copyObject(opclause);
commuted->opno = get_commutator(opclause->opno);
commuted->opfuncid = get_opcode(commuted->opno);
commuted->args = list_make2(rightop, leftop);
pc->op = commuted;
}I had to hunt for it, but it appears that you're pre-filtering clauses
with the Consts on the left and no valid commutator in
match_clauses_to_partkey. I think it's likely worth putting a comment
to mention that reversed clauses with no commutator should have been
filtered out beforehand. I'd say it's also worthy of an Assert().
Yeah, added a comment and an Assert.
4. The spelling of "arbitrary" is incorrect in:
* partattno == 0 refers to arbirtary expressions, which get the
Fixed.
5. I've noticed that partition pruning varies slightly from constraint
exclusion in the following case:create table ta (a int not null) partition by list (a);
create table ta1 partition of ta for values in(1,2);
create table ta2 partition of ta for values in(3,4);explain select * from ta where a <> 1 and a <> 2; -- partition ta1 is
not eliminated.
QUERY PLAN
-------------------------------------------------------------
Append (cost=0.00..96.50 rows=5050 width=4)
-> Seq Scan on ta1 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
-> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
(5 rows)alter table ta1 add constraint ta1_chk check (a in(1,2)); -- add a
check constraint to see if can be removed.
explain select * from ta where a <> 1 and a <> 2; -- it can.
QUERY PLAN
-------------------------------------------------------------
Append (cost=0.00..48.25 rows=2525 width=4)
-> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
(3 rows)
I see. It seems that the current approach of handling <> operators by
turning clauses containing the same into (key > const OR key < const)
doesn't always work. I think I had noticed that for list partitioning at
least. I will work on alternative way of handling that in the next
version of the patch.
Meanwhile, please find attached patches (v15) that take care of the rest
of the comments. Most of the updates are to patch 0002, compared to the
last (v14) version.
Thanks again for your thoughtful review comments.
Thanks,
Amit
Attachments:
0001-Some-interface-changes-for-partition_bound_-cmp-bsea-v15.patchtext/plain; charset=UTF-8; name=0001-Some-interface-changes-for-partition_bound_-cmp-bsea-v15.patchDownload
From 35ae8140a20aed5e0faa0c47b6932244d9a097f6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 1/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 164 ++++++++++++++++++++++++++++++----------
1 file changed, 122 insertions(+), 42 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5c4018e9f7..dc631b2761 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -985,6 +1011,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -999,8 +1027,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1073,10 +1107,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1127,6 +1167,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1148,8 +1189,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1163,9 +1207,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2534,12 +2578,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2566,11 +2613,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2778,12 +2829,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2805,11 +2856,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2818,25 +2869,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2847,12 +2928,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2866,20 +2948,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2892,8 +2973,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0002-Introduce-a-get_partitions_from_clauses-v15.patchtext/plain; charset=UTF-8; name=0002-Introduce-a-get_partitions_from_clauses-v15.patchDownload
From 7ebe526aa1ec1a2c4e6905575bc1cff4b502e637 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of query clauses and
return a set of indexes of the partitions that satisfy all of those
clauses.
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Callers must have checked that each of the clauses matches one of the
partition keys.
---
src/backend/catalog/partition.c | 1784 ++++++++++++++++++++++++++++++++++
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/optimizer/clauses.h | 2 +
5 files changed, 1793 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dc631b2761..22de48ac4d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -163,6 +167,69 @@ typedef struct PartitionBoundCmpArg
int ndatums;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ *
+ * Equal keys are not required to be in any particular order, unlike the
+ * keys below which must appear in the same order as partition keys.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Does the query specify a key to be null or not null? Partitioning
+ * handles null partition keys specially depending on the partitioning
+ * method in use, we store this information.
+ */
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +278,31 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static int32 partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1578,9 +1670,1701 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_from_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ Bitmapset *result;
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ Assert(partclauses != NIL);
+
+ /*
+ * If relation is a partition itself, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partconstr)
+ {
+ PartitionBoundInfo boundinfo =
+ RelationGetPartitionDesc(relation)->boundinfo;
+
+ /*
+ * We need to worry about such a case only if the relation has a
+ * default partition to begin with.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ }
+ }
+
+ result = get_partitions_from_clauses_recurse(relation, rt_index,
+ partclauses);
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_from_clauses_recurse
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses;
+ ListCell *lc;
+
+ /*
+ * Reduce the set of clauses into a form that get_partitions_for_keys()
+ * can work with.
+ */
+ nkeys = classify_partition_bounding_keys(relation, clauses, rt_index,
+ &keys, &constfalse,
+ &or_clauses);
+
+ /*
+ * classify_partition_bounding_keys() may have found clauses marked
+ * pseudo-constant that are false that the planner didn't or it may have
+ * itself found contradictions among clauses.
+ */
+ if (constfalse)
+ return NULL;
+
+ /*
+ * If all clauses in the list were OR clauses,
+ * classify_partition_bounding_keys() wouldn't have formed keys yet. They
+ * will be handled below by recursively calling this function for each of
+ * OR clauses' arguments and combining the resulting partition sets
+ * appropriately.
+ */
+ if (nkeys > 0)
+ result = get_partitions_for_keys(relation, &keys);
+ else
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ /* No point in trying to look at other conjunctive clauses. */
+ if (bms_is_empty(result))
+ return NULL;
+
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ ListCell *lc1;
+ Bitmapset *or_partset = NULL;
+
+ foreach(lc1, or->args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc1));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation
+ * due to the latter's partition constraint, which means we must
+ * not add its partitions to or_partset. But the clause may not
+ * contain this relation's partition key expressions (instead the
+ * parent's), so we could not depend on just calling
+ * get_partitions_from_clauses_recurse(relation, ...) to determine
+ * that the clause indeed prunes all of the relation's partition.
+ *
+ * Use predicate refutation proof instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation,
+ rt_index,
+ arg_clauses);
+
+ /*
+ * Partition sets obtained from mutually-disjunctive clauses are
+ * combined using set union.
+ */
+ or_partset = bms_add_members(or_partset, arg_partset);
+ }
+
+ /*
+ * Partition sets obtained from mutually-conjunctive clauses are
+ * combined using set intersection.
+ */
+ result = bms_int_members(result, or_partset);
+ bms_free(or_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ?\
+ (IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) :\
+ equal((expr), (partexpr)))
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, and max keys, along with
+ * any Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max
+ * bounds. For example, of a > 1, a > 2, and a >= 5, "5" is the best min
+ * bound for the column a, which also happens to be an inclusive bound.
+ * When analyzing multiple clauses referencing the same key, it is checked
+ * if there are mutually contradictory clauses and if so, we set *constfalse
+ * to true to indicate to the caller that the set of clauses cannot be true
+ * for any partition. It is also set if the list already contains a
+ * pseudo-constant clause.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by clauses containing equality operator, unless hash
+ * partitioning is in use, in which case, it's possible that some keys have
+ * IS NULL clauses while remaining have clauses with equality operator.
+ * Min and max bounds could contain bound values for only a prefix of keys.
+ *
+ * All the OR clauses encountered in the list and those generated from certain
+ * ScalarArrayOpExprs are added to *or_clauses. It's the responsibility of the
+ * caller to process the argument clauses of each of the OR clauses, which
+ * would involve recursively calling this function.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool only_or_clauses = true;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, sizeof(keyclauses_all));
+ /* false means we don't know if a given key is null */
+ memset(keyisnull, false, sizeof(keyisnull));
+ /* false means we don't know if a given key is not null */
+ memset(keyisnotnull, false, sizeof(keyisnull));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i],
+ partcoll = partkey->partcollation[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ /*
+ * A non-zero partattno refers to a simple column reference that
+ * will be matched against varattno of a Var appearing the clause.
+ * partattno == 0 refers to arbitrary expressions, which get the
+ * current one from PartitionKey.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ /* Copy to avoid overwriting the relcache's content. */
+ partexpr = copyObject(lfirst(partexprs_item));
+
+ /*
+ * Expressions stored in PartitionKey in the relcache all
+ * contain a dummy varno (that is, 1), but we must switch to
+ * the RT index of the table in this query so that it can be
+ * correctly matched to the expressions coming from the query.
+ */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ constexpr = leftop;
+ else
+ /* Clause not meant for this column. */
+ continue;
+
+ /*
+ * Handle some cases wherein the clause's operator may not
+ * belong to the partitioning operator family. For example,
+ * operators named '<>' are not listed in any operator
+ * family whatsoever. Also, ordering opertors like '<' are
+ * not listed in the hash operator family.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Expr *ltexpr,
+ *gtexpr;
+ Oid negator,
+ ltop,
+ gtop;
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is '<>', check if its
+ * negator is an equality operator. If so and it's a btree
+ * equality operator, we can use a special trick to prune
+ * partitions that won't satisfy the original '<>'
+ * operator -- we generate an OR expression
+ * 'leftop < rightop OR leftop > rightop' and add it to
+ * *or_clauses.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ {
+ Expr *or;
+
+ ltop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTLessStrategyNumber);
+ gtop = get_opfamily_member(partopfamily,
+ lefttype, righttype,
+ BTGreaterStrategyNumber);
+ ltexpr = make_opclause(ltop, BOOLOID, false,
+ (Expr *) leftop,
+ (Expr *) rightop,
+ InvalidOid, partcoll);
+ gtexpr = make_opclause(gtop, BOOLOID, false,
+ (Expr *) leftop,
+ (Expr *) rightop,
+ InvalidOid, partcoll);
+ or = makeBoolExpr(OR_EXPR,
+ list_make2(ltexpr, gtexpr), -1);
+ *or_clauses = lappend(*or_clauses, or);
+ continue;
+ }
+ }
+
+ /*
+ * Getting here means opclause uses an ordering op and
+ * hash partitioning is in use. We shouldn't try to
+ * reason about such an operator for the purposes of
+ * partition pruning, because hash partitioning doesn't
+ * make partitioning decisions based on relative ordering
+ * of keys.
+ */
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * Flip the left and right args if we have to, because the
+ * code which extract the constant value to use for
+ * partition-pruning expects to find it as the rightop of the
+ * clause. (See below in this function.)
+ */
+ if (constexpr == rightop)
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+ Oid commutator = get_commutator(opclause->opno);
+
+ /*
+ * Caller must have made sure to check that the commutator
+ * indeed exists.
+ */
+ Assert(OidIsValid(commutator));
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = commutator;
+ commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ only_or_clauses = false;
+
+ /*
+ * Since we only allow strict operators, require keys to be
+ * not null.
+ */
+ keyisnotnull[i] = true;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) &&
+ ((Var *) arg)->varattno == partattno) ||
+ equal(arg, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ keyisnull[i] = true;
+ else
+ keyisnotnull[i] = true;
+ n_keynullness++;
+ only_or_clauses = false;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ only_or_clauses = false;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (only_or_clauses || *constfalse)
+ return 0;
+
+ /*
+ * Try to eliminate redundant keys. In the process, we might find out
+ * that clauses are mutually contradictory and hence can never be true
+ * for any rows.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i], &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ int32 op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ if (op_strategy < 0 &&
+ need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ else if (op_strategy == 0)
+ {
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ }
+ else if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found the same for partition key columns.
+ * If present, we don't need minkeys and maxkeys anymore. In the case
+ * of hash partitioning, we don't require all equal keys to be operator
+ * clauses. For hash partitioning, any IS NULL clauses are considered
+ * as equal keys by the code performing actual pruning, at which time it
+ * is checked whether, along with any operator clauses, all partition key
+ * columns are covered.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ keys->keyisnull[i] = keyisnull[i];
+ keys->keyisnotnull[i] = keyisnotnull[i];
+ }
+
+ return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
+}
+
+/*
+ * Returns -1, 0, or 1 to signify that the partitioning clause has a </<=,
+ * =, and >/>= operator, respectively. Sets *incl to true if equality is
+ * implied.
+ */
+static int32
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ int32 result;
+
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = 0;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ result = -1;
+ *incl = (op->op_strategy == BTLessEqualStrategyNumber);
+ break;
+ case BTEqualStrategyNumber:
+ result = 0;
+ *incl = true;
+ break;
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ result = 1;
+ *incl = (op->op_strategy == BTGreaterEqualStrategyNumber);
+ break;
+ }
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partattoff])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partattoff], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If couldn't coerce to the partition key type, that is, the type of
+ * datums stored in PartitionBoundInfo, no hope of using this
+ * expression for anything partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+/*
+ * For a given partition key column, find the most restrictive of the clauses
+ * contained in all_clauses that are known to match the column. If in the
+ * process, it is found that two clauses are mutually contradictory, we simply
+ * stop, set *constfalse to true, and return.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey, int partattoff,
+ List *all_clauses, List **result,
+ bool *constfalse)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ hash_clause = NULL;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[partattoff],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've matched
+ * a clause and found another whose constant operand doesn't match
+ * the constant operand of the former, we have a case of mutually
+ * contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, partattoff,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value and
+ * so add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with the same. It's possible that mutual
+ * contradiction is proved at some higher level, but it's just
+ * that we couldn't do so here.
+ */
+ else
+ *result = lappend(*result, cur);
+
+ /* The code below is for btree operators, which cur is not. */
+ continue;
+ }
+
+ /*
+ * Stuff that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points to the currently best scan key of strategy
+ * type s+1; it is NULL if we haven't yet found such a key for this
+ * attr.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, replace old key. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+
+ /* The old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ *result = lappend(*result, hash_clause);
+ return;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equal key with keys of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq key is
+ * a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq key is a = 3, then because 3 < 5, we no longer need a < 5,
+ * because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the result.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ if (btree_clauses[s])
+ *result = lappend(*result, btree_clauses[s]);
+}
+
+/*
+ * Evaluate 'leftarg op rightarg' and set *result to its value.
+ *
+ * leftarg and rightarg referred to above actually refer to the constant
+ * operand (Datum) of the clause contained in the parameters leftarg and
+ * rightarg below, respectively. And op refers to the operator of the
+ * clause contained in the parameter op below.
+ *
+ * Returns true if we could actually perform the evaluation. False is
+ * returned otherwise, that is, in cases where we couldn't perform the
+ * evaluation for reasons such as operands values being unavailable or
+ * types of operands being incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partattoff];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partattoff,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partattoff,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg_const and rightarg_const are both of the type expected
+ * by op's operator, then compare them using the latter.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy %c",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor
+ * using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Hash partitioning handles puts nulls into a normal partition and
+ * doesn't require to define a special null-accpting partition.
+ * Caller didn't count nulls as a valid key; do so ourselves.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ if (keys->keyisnull[i])
+ keys->n_eqkeys++;
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keys->keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keyisnull[i])
+ {
+ int other_idx = -1;
+
+ /*
+ * Only a designated partition accepts nulls, which if there
+ * exists one, return the same.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) ||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ return bms_make_singleton(other_idx);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ /* Look up using binary search if eqkeys matches any of the datums. */
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * minoff set to -1 means all datums are greater than minkeys, which
+ * means all partitions satisfy minkeys. In that case, set minoff to
+ * the index of the leftmost datum, viz. 0.
+ *
+ * If the bound at minoff doesn't exactly match minkey or if it does,
+ * but minkey isn't inclusive, move to the bound on the right.
+ */
+ if (minoff == -1 || !is_equal || !keys->min_incl)
+ minoff++;
+
+ /*
+ * boundinfo->ndatums - 1 is the last valid list partition datums
+ * index.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ minoff = -1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * Unlike minoff, we leave maxoff that is set to -1 unchanged, because
+ * it simply means none of the partitions satisfies maxkeys.
+ *
+ * If the bound at maxoff exactly matches maxkey (is_equal), but the
+ * maxkey is not inclusive, then go to the bound on left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some datums
+ * (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ Bitmapset *result = NULL;
+
+ /*
+ * All datums between those at minoff and maxoff satisfy the query
+ * keys, so add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a ranget partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keyisnull[i])
+ {
+ int other_idx = -1;
+
+ /*
+ * Only a designated partition accepts nulls, which if there
+ * exists one, return the same.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) ||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ return bms_make_singleton(other_idx);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ /* Look up using binary search if eqkeys matches any of the datums. */
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && boundinfo->indexes[eqoff+1] >= 0)
+ {
+ /*
+ * eqoff is gives us the bound that is known to be <= eqkeys,
+ * given how partition_bound_bsearch works. The bound at eqoff+1,
+ * then, would be the upper bound of the only partition that needs
+ * to be scanned.
+ */
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * If only a prefix of the whole partition key is provided, there will
+ * be multiple partitions whose bound share the same prefix. If minkey
+ * is inclusive, we must make minoff point to the leftmost such bound,
+ * making the result contain all such partitions. If it is exclusive,
+ * we must move minoff to the right such that minoff points to the
+ * first partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in the
+ * result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, minoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is known to
+ * be <= query's minkey. The bound at minoff + 1 (if there is one),
+ * then, would be the upper bound of the leftmost partition that needs
+ * to be scanned.
+ */
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ *
+ * 1 more index than range partition datums
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, maxoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is known to
+ * be <= query's maxkey. The bound at maxoff+1, then, would be the
+ * upper bound of the rightmost partition that needs to be scanned.
+ * Although, if the bound is equal to maxkeys and the latter is not
+ * inclusive, then the bound at maxoff itself is the upper bound of
+ * the rightmost partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool include_def = false;
+ Bitmapset *result = NULL;
+
+
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper bound of
+ * an unassigned range of values, move to the adjacent bound which must
+ * be the upper bound of the leftmost or rightmost partition,
+ * respectively, that needs to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do indeed
+ * satisfy the query, but don't have a valid partition assigned. The
+ * default partition would've been included to cover those values.
+ * Although, if the original bound in question is an infinite value,
+ * there would not be any unassigned range to speak of, because the
+ * range is unbounded in that direction by definition, so no need to
+ * include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There might exist a range of values unassigned to any non-default
+ * range partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys will nulls are mapped to default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!keys->keyisnotnull[i])
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6a2d5ad760..ce83fbcb22 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -149,8 +149,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4745,7 +4743,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2983cfa217..7a5ab45c5c 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -71,4 +71,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 0d0ba7c66a..f2fddeceb8 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -187,4 +187,7 @@ DATA(insert OID = 4082 ( 3580 pg_lsn_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index e3672218f3..1ef13a49de 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
0003-Move-some-code-of-set_append_rel_size-to-separate-fu-v15.patchtext/plain; charset=UTF-8; name=0003-Move-some-code-of-set_append_rel_size-to-separate-fu-v15.patchDownload
From 80f19ef8a21001d5b4c114363fd38e5a4ee697ff Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0e8463e4a3..86e7a20da9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 674cfc6b06..daa8f516ce 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 99f65b44f2..00c134d5a3 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -299,5 +299,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0004-More-refactoring-around-partitioned-table-AppendPath-v15.patchtext/plain; charset=UTF-8; name=0004-More-refactoring-around-partitioned-table-AppendPath-v15.patchDownload
From b5d1c0ae03e07ce72f2b67e7c3faeb76e98124f6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 133 +++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 19 +++--
src/backend/optimizer/util/relnode.c | 10 +++
src/include/nodes/relation.h | 22 +++++-
4 files changed, 128 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 86e7a20da9..83f79ea6cb 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,22 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
+ /*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel. Note
+ * that rel (the parent) might just be a union all subquery, in which
+ * case, there is nothing to do here.
+ */
+ if (IS_PARTITIONED_REL(childrel) && IS_PARTITIONED_REL(rel))
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1220,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1312,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1362,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 382791fadb..ffdf9c5247 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6189,14 +6189,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index daa8f516ce..dcfda1c3cc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +743,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1108b6a0ea..c445f401d9 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,6 +529,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +658,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
0005-Teach-planner-to-use-get_partitions_from_clauses-v15.patchtext/plain; charset=UTF-8; name=0005-Teach-planner-to-use-get_partitions_from_clauses-v15.patchDownload
From e5888750063ce306f420f23f53e2dcd13177f682 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using the information in its
PartitionScheme. Further, if we pass the set of matched clauses to
get_partitions_from_clauses(), we get the set of matching partitions
in (hopefully) less time than determining the same by running the
matching algorithm separately for each partition.
Authors: Amit Langote, Dilip Kumar
---
src/backend/optimizer/path/allpaths.c | 406 +++++++++++++++++++++++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 33 ++-
src/include/nodes/relation.h | 7 +-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition_prune.out | 340 +++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 47 ++-
7 files changed, 786 insertions(+), 79 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 83f79ea6cb..eeaf8fd935 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,8 +20,10 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
@@ -136,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -847,6 +857,397 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * in rel->baserestrictinfo. An empty list is returned if no matching
+ * partitions were found.
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *partclauses;
+ bool contains_const,
+ constfalse;
+ List *result = NIL;
+ int i;
+ Relation parent;
+ PartitionDesc partdesc;
+ Bitmapset *partindexes;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(root, rel,
+ list_copy(rel->baserestrictinfo),
+ &contains_const,
+ &constfalse);
+
+ /* We're done here. */
+ if (constfalse)
+ return NIL;
+
+ parent = heap_open(rte->relid, NoLock);
+ partdesc = RelationGetPartitionDesc(parent);
+
+ /*
+ * If we have matched clauses that contain at least one constant operand,
+ * then use these to prune partitions.
+ */
+ if (partclauses != NIL && contains_const)
+ partindexes = get_partitions_from_clauses(parent, rel->relid,
+ partclauses);
+ else
+ {
+ /*
+ * Else there are no clauses that are useful to prune any paritions,
+ * so we must scan all partitions.
+ */
+ partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause
+ * must be an operator clause of the form (partkey op const) or (const op
+ * partkey); the latter only if a suitable commutator exists. Furthermore,
+ * the operator must be strict and its input collation must match the partition
+ * collation. The aforementioned "const" means any expression that doesn't
+ * involve a volatile function or a Var of this relation. We allow Vars
+ * belonging to other relations (for example, if the clause is a join clause),
+ * but they are treated as parameters whose values are not known now, so cannot
+ * be used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join clauses
+ * appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ *
+ * If clauses contains at least one constant operand or a Nullness test,
+ * *contains_const is set so that the caller can pass the clauses to the
+ * partitioning module right away.
+ *
+ * If the list contains a pseudo-constant RestrictInfo with constant false
+ * value, *constfalse is set.
+ */
+static List *
+match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Node *member = lfirst(lc);
+ Expr *clause;
+ int i;
+
+ if (IsA(member, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) member;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+ else
+ clause = (Expr *) member;
+
+ /*
+ * For a BoolExpr, we should try to match each of its args with the
+ * partition key as described below for each type.
+ */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ /*
+ * For each of OR clause's args, call this function
+ * recursively with a given arg as the only member in the
+ * input list and see if it's returned as matching the
+ * partition key. Add the OR clause to the result iff at
+ * least one of its args contain a matching clause.
+ */
+ BoolExpr *orclause = (BoolExpr *) clause;
+ ListCell *lc1;
+ bool arg_matches_key = false,
+ matched_arg_contains_const = false,
+ all_args_constfalse = true;
+
+ foreach (lc1, orclause->args)
+ {
+ Node *arg = lfirst(lc1);
+ bool contains_const1,
+ constfalse1;
+
+ if (match_clauses_to_partkey(root, rel, list_make1(arg),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ arg_matches_key = true;
+ matched_arg_contains_const = contains_const1;
+ }
+
+ /* We got at least one arg that is not constant false. */
+ if (!constfalse1)
+ all_args_constfalse = false;
+ }
+
+ if (arg_matches_key)
+ {
+ result = lappend(result, clause);
+ *contains_const = matched_arg_contains_const;
+ }
+
+ /* OR clause is "constant false" if all of its args are. */
+ *constfalse = all_args_constfalse;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Since the clause is itself implicitly ANDed with other
+ * clauses in the input list, queue the args to be processed
+ * later as if they were part of the original input list.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the clauses matches the partition key and add it to
+ * the result list if other things such as operator input
+ * collation, strictness, etc. look fine.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning.
+ */
+ result = lappend(result, clause);
+
+ if (!*contains_const)
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* OK to add to the result. */
+ result = lappend(result, clause);
+ if (IsA(estimate_expression_value(root, rightop), Const))
+ *contains_const = true;
+ else
+ *contains_const = false;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* A Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Certain Boolean conditions have a special shape, which we
+ * accept if the partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ /*
+ * Only accept those for pruning that appear to be
+ * IS [NOT] TRUE/FALSE.
+ */
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+ Expr *arg = btest->arg;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN &&
+ equal((Node *) arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var))
+ {
+ if (equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ }
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +1289,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 5e03f8bc21..5bd30312cb 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1397,6 +1397,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f7438714c4..df963f701f 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1248,21 +1247,25 @@ get_relation_constraints(PlannerInfo *root,
}
/* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
+
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1920,6 +1923,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index c445f401d9..bcb669d212 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..2072766efd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..e950cff6d2 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -120,6 +120,8 @@ explain (costs off) select * from lp where a <> 'a' and a <> 'd';
QUERY PLAN
-------------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_ef
@@ -128,12 +130,14 @@ explain (costs off) select * from lp where a <> 'a' and a <> 'd';
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-(9 rows)
+(11 rows)
explain (costs off) select * from lp where a not in ('a', 'd');
QUERY PLAN
------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_bc
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_ef
@@ -142,7 +146,7 @@ explain (costs off) select * from lp where a not in ('a', 'd');
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-> Seq Scan on lp_default
Filter: (a <> ALL ('{a,d}'::bpchar[]))
-(9 rows)
+(11 rows)
-- collation matches the partitioning collation, pruning works
create table coll_pruning (a text collate "C") partition by list (a);
@@ -208,16 +212,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +521,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +575,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +649,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +657,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +712,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +888,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +900,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +961,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1007,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1030,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1071,253 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- some more cases
+-- pruning for partitioned table appearing inside a sub-query
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..72dae80e8a 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,49 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- some more cases
+
+-- pruning for partitioned table appearing inside a sub-query
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp;
--
2.11.0
On 2017/12/20 17:27, Amit Langote wrote:
On 2017/12/19 13:36, David Rowley wrote:
5. I've noticed that partition pruning varies slightly from constraint
exclusion in the following case:create table ta (a int not null) partition by list (a);
create table ta1 partition of ta for values in(1,2);
create table ta2 partition of ta for values in(3,4);explain select * from ta where a <> 1 and a <> 2; -- partition ta1 is
not eliminated.
QUERY PLAN
-------------------------------------------------------------
Append (cost=0.00..96.50 rows=5050 width=4)
-> Seq Scan on ta1 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
-> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
(5 rows)alter table ta1 add constraint ta1_chk check (a in(1,2)); -- add a
check constraint to see if can be removed.
explain select * from ta where a <> 1 and a <> 2; -- it can.
QUERY PLAN
-------------------------------------------------------------
Append (cost=0.00..48.25 rows=2525 width=4)
-> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
(3 rows)I see. It seems that the current approach of handling <> operators by
turning clauses containing the same into (key > const OR key < const)
doesn't always work. I think I had noticed that for list partitioning at
least. I will work on alternative way of handling that in the next
version of the patch.
I think I was able to make this work and in the process of making it work,
also came to the conclusion that this could be made to work sensibly
*only* for list partitioned tables. That's because one cannot prune a
given partition using a set of <> operator clauses, if we cannot be sure
that those clauses exclude *all* values of the partition key allowed by
that partition. It's only possible to do that for a list partitioned
table, because by definition one is required to spell out every value that
a given partition of such table allows.
There is a new function in the updated patch that does the pruning using
<> operator clauses and it's implemented by assuming it's only ever called
for a list partitioned table. So, sorry range and hash partitioned tables.
Attached updated set of patches.
Thanks,
Amit
Attachments:
0001-Some-interface-changes-for-partition_bound_-cmp-bsea-v16.patchtext/plain; charset=UTF-8; name=0001-Some-interface-changes-for-partition_bound_-cmp-bsea-v16.patchDownload
From c8dd132553a5793944888f7b836ba5cdf7510b33 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 1/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 164 ++++++++++++++++++++++++++++++----------
1 file changed, 122 insertions(+), 42 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5c4018e9f7..dc631b2761 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -985,6 +1011,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -999,8 +1027,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1073,10 +1107,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1127,6 +1167,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1148,8 +1189,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1163,9 +1207,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2534,12 +2578,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2566,11 +2613,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2778,12 +2829,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2805,11 +2856,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2818,25 +2869,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2847,12 +2928,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2866,20 +2948,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2892,8 +2973,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0002-Introduce-a-get_partitions_from_clauses-v16.patchtext/plain; charset=UTF-8; name=0002-Introduce-a-get_partitions_from_clauses-v16.patchDownload
From 70c87a6e324003206f3a3efc393691f188a7597c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of query clauses and
return a set of indexes of the partitions that satisfy all of those
clauses.
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Callers must have checked that each of the clauses matches one of the
partition keys.
---
src/backend/catalog/partition.c | 1947 ++++++++++++++++++++++++++++++++++
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/optimizer/clauses.h | 2 +
5 files changed, 1956 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dc631b2761..b2a2ab6f3d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -163,6 +167,69 @@ typedef struct PartitionBoundCmpArg
int ndatums;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ *
+ * Equal keys are not required to be in any particular order, unlike the
+ * keys below which must appear in the same order as partition keys.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Does the query specify a key to be null or not null? Partitioning
+ * handles null partition keys specially depending on the partitioning
+ * method in use, we store this information.
+ */
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +278,35 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static Bitmapset *get_partitions_from_ne_clauses(Relation relation,
+ List *ne_clauses);
+static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
+ int rt_index, List *or_clause_args);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses, List **ne_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static int32 partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1578,9 +1674,1860 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_from_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ Bitmapset *result;
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ Assert(partclauses != NIL);
+
+ /*
+ * If relation is a partition itself, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partconstr)
+ {
+ PartitionBoundInfo boundinfo =
+ RelationGetPartitionDesc(relation)->boundinfo;
+
+ /*
+ * We need to worry about such a case only if the relation has a
+ * default partition to begin with.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ }
+ }
+
+ result = get_partitions_from_clauses_recurse(relation, rt_index,
+ partclauses);
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_from_clauses_recurse
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses,
+ *ne_clauses;
+ ListCell *lc;
+
+ /*
+ * Reduce the set of clauses into a form that get_partitions_for_keys()
+ * can work with.
+ */
+ nkeys = classify_partition_bounding_keys(relation, clauses, rt_index,
+ &keys, &constfalse,
+ &or_clauses, &ne_clauses);
+
+ /*
+ * classify_partition_bounding_keys() may have found clauses marked
+ * pseudo-constant that are false that the planner didn't or it may have
+ * itself found contradictions among clauses.
+ */
+ if (constfalse)
+ return NULL;
+
+ /*
+ * If all clauses in the list were OR clauses,
+ * classify_partition_bounding_keys() wouldn't have formed keys yet. They
+ * will be handled below by recursively calling this function for each of
+ * OR clauses' arguments and combining the resulting partition sets
+ * appropriately.
+ */
+ if (nkeys > 0)
+ result = get_partitions_for_keys(relation, &keys);
+ else
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ /* No point in trying to look at other conjunctive clauses. */
+ if (bms_is_empty(result))
+ return NULL;
+
+ /*
+ * Only keep the partitions in result that are not pruned by the clauses
+ * in ne_clauses.
+ */
+ if (ne_clauses)
+ result = bms_int_members(result,
+ get_partitions_from_ne_clauses(relation,
+ ne_clauses));
+
+ /*
+ * Ditto, but this time or_clauses.
+ */
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+
+ /*
+ * Clauses in or_clauses are themselves mutually conjunctive, so
+ * combine with result using set intersection.
+ */
+ result = bms_int_members(result,
+ get_partitions_from_or_clause_args(relation,
+ rt_index,
+ or->args));
+ }
+
+ return result;
+}
+
+/* Assumes partkey exists in the scope and is of a list partitioned table. */
+#define partkey_datums_equal(d1, d2)\
+ (0 == DatumGetInt32(FunctionCall2Coll(&partkey->partsupfunc[0],\
+ partkey->partcollation[0],\
+ (d1), (d2))))
+/*
+ * Check if d is equal to some member of darray where equality is determined
+ * by the partitioning comparison function.
+ */
+static bool
+datum_in_array(PartitionKey partkey, Datum d, Datum *darray, int n)
+{
+ int i;
+
+ if (darray == NULL || n == 0)
+ return false;
+
+ for (i = 0; i < n; i++)
+ if (partkey_datums_equal(d, darray[i]))
+ return true;
+
+ return false;
+}
+
+/*
+ * count_partition_datums
+ *
+ * Returns the number of non-null datums allowed by a non-default list
+ * partition with given index.
+ */
+static int
+count_partition_datums(Relation rel, int index)
+{
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ int i,
+ result = 0;
+
+ Assert(index != boundinfo->default_index);
+
+ /*
+ * The answer is as many as the count of occurrence of the value index
+ * in boundinfo->indexes[].
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ if (index == boundinfo->indexes[i])
+ result += 1;
+
+ return result;
+}
+
+/*
+ * get_partitions_from_ne_clauses
+ *
+ * Return partitions of relation that satisfy all <> operator clauses in
+ * ne_clauses. Only ever called if relation is a list partitioned table.
+ */
+static Bitmapset *
+get_partitions_from_ne_clauses(Relation relation, List *ne_clauses)
+{
+ ListCell *lc;
+ Bitmapset *result,
+ *excluded_parts;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ Datum *exclude_datums;
+ int *count_excluded,
+ n_exclude_datums,
+ i;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+
+ /*
+ * How this works:
+ *
+ * For each constant expression, we look up the partition that would
+ * contain its value and mark the same as excluded partition. After
+ * doing the same for all clauses we'll have set of partitions that
+ * are excluded. For each excluded partition, check if there exist
+ * values that it allows but are not specified in the clauses, if so
+ * the partition won't actually be excluded.
+ */
+
+ /* De-duplicate constant values. */
+ exclude_datums = (Datum *) palloc0(list_length(ne_clauses) *
+ sizeof(Datum));
+ n_exclude_datums = 0;
+ foreach(lc, ne_clauses)
+ {
+ PartClause *pc = lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(partkey, 0, pc->constarg, &datum) &&
+ !datum_in_array(partkey, datum, exclude_datums, n_exclude_datums))
+ exclude_datums[n_exclude_datums++] = datum;
+ }
+
+ /*
+ * For each value, if it's found in boundinfo, increment the count of its
+ * partition as excluded due to that value.
+ */
+ count_excluded = (int *) palloc0(partdesc->nparts * sizeof(int));
+ for (i = 0; i < n_exclude_datums; i++)
+ {
+ int offset,
+ excluded_part;
+ bool is_equal;
+ PartitionBoundCmpArg arg;
+ Datum argdatums[] = {exclude_datums[i]};
+
+ memset(&arg, 0, sizeof(arg));
+ arg.datums = argdatums;
+ arg.ndatums = 1;
+ offset = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+ if (offset >= 0 && is_equal && boundinfo->indexes[offset] >= 0)
+ {
+ excluded_part = boundinfo->indexes[offset];
+ count_excluded[excluded_part]++;
+ }
+ }
+
+ excluded_parts = NULL;
+ for (i = 0; i < partdesc->nparts; i++)
+ {
+ /*
+ * If all datums of this partition appeared in ne_clauses, exclude
+ * this partition.
+ */
+ if (count_excluded[i] > 0 &&
+ count_excluded[i] == count_partition_datums(relation, i))
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Also, exclude the "null-only" partition, because strict clauses in
+ * ne_clauses will not select any rows from it.
+ */
+ if (count_partition_datums(relation, boundinfo->null_index) == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(count_excluded);
+ pfree(exclude_datums);
+
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ result = bms_del_members(result, excluded_parts);
+ bms_free(excluded_parts);
+
+ return result;
+}
+
+/*
+ * get_partitions_from_or_clause_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_clause_args.
+ */
+static Bitmapset *
+get_partitions_from_or_clause_args(Relation relation, int rt_index,
+ List *or_clause_args)
+{
+ ListCell *lc;
+ Bitmapset *result = NULL;
+
+ foreach(lc, or_clause_args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation
+ * due to the latter's partition constraint, which means we must
+ * not add its partitions to or_partset. But the clause may not
+ * contain this relation's partition key expressions (instead the
+ * parent's), so we could not depend on just calling
+ * get_partitions_from_clauses_recurse(relation, ...) to determine
+ * that the clause indeed prunes all of the relation's partition.
+ *
+ * Use predicate refutation proof instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation, rt_index,
+ arg_clauses);
+
+ /*
+ * Partition sets obtained from mutually-disjunctive clauses are
+ * combined using set union.
+ */
+ result = bms_add_members(result, arg_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ?\
+ (IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) :\
+ equal((expr), (partexpr)))
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, and max keys, along with
+ * any Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max
+ * bounds. For example, of a > 1, a > 2, and a >= 5, "5" is the best min
+ * bound for the column a, which also happens to be an inclusive bound.
+ * When analyzing multiple clauses referencing the same key, it is checked
+ * if there are mutually contradictory clauses and if so, we set *constfalse
+ * to true to indicate to the caller that the set of clauses cannot be true
+ * for any partition. It is also set if the list already contains a
+ * pseudo-constant clause.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by clauses containing equality operator, unless hash
+ * partitioning is in use, in which case, it's possible that some keys have
+ * IS NULL clauses while remaining have clauses with equality operator.
+ * Min and max bounds could contain bound values for only a prefix of keys.
+ *
+ * All the OR clauses encountered in the list and those generated from certain
+ * ScalarArrayOpExprs are added to *or_clauses. It's the responsibility of the
+ * caller to process the argument clauses of each of the OR clauses, which
+ * would involve recursively calling this function.
+ *
+ * Clauses containing a <> operator are added to *ne_clauses, provided its
+ * negator is a valid partitioning equality operator and that too only if
+ * list partitioning is in use.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses,
+ List **ne_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool will_compute_keys = false;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *ne_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, sizeof(keyclauses_all));
+ /* false means we don't know if a given key is null */
+ memset(keyisnull, false, sizeof(keyisnull));
+ /* false means we don't know if a given key is not null */
+ memset(keyisnotnull, false, sizeof(keyisnull));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ /*
+ * A non-zero partattno refers to a simple column reference that
+ * will be matched against varattno of a Var appearing the clause.
+ * partattno == 0 refers to arbitrary expressions, which get the
+ * current one from PartitionKey.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ /* Copy to avoid overwriting the relcache's content. */
+ partexpr = copyObject(lfirst(partexprs_item));
+
+ /*
+ * Expressions stored in PartitionKey in the relcache all
+ * contain a dummy varno (that is, 1), but we must switch to
+ * the RT index of the table in this query so that it can be
+ * correctly matched to the expressions coming from the query.
+ */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+ bool is_ne = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ constexpr = leftop;
+ else
+ /* Clause not meant for this column. */
+ continue;
+
+ /*
+ * Handle some cases wherein the clause's operator may not
+ * belong to the partitioning operator family. For example,
+ * operators named '<>' are not listed in any operator
+ * family whatsoever. Also, ordering opertors like '<' are
+ * not listed in the hash operator family.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ int strategy;
+ Oid negator,
+ lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is '<>', check if its
+ * negator is an equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber &&
+ partkey->strategy == PARTITION_STRATEGY_LIST)
+ is_ne = true;
+ }
+
+ /*
+ * We're not going turn this into a key as it is, either
+ * because this is an ordering op and hash partitioning is
+ * in use or we found a <> operator useful for pruning
+ * that will be handed over to the caller without turning
+ * it into a key. So, move on.
+ */
+ if (!is_ne)
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * Flip the left and right args if we have to, because the
+ * code which extract the constant value to use for
+ * partition-pruning expects to find it as the rightop of the
+ * clause. (See below in this function.)
+ */
+ if (constexpr == rightop)
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+ Oid commutator = get_commutator(opclause->opno);
+
+ /*
+ * Caller must have made sure to check that the commutator
+ * indeed exists.
+ */
+ Assert(OidIsValid(commutator));
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = commutator;
+ commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ /*
+ * We don't a <> operator clause into a key right away.
+ * Instead the caller will handle such clauses to
+ * get_partitions_from_ne_clauses(), instead of what it would
+ * do for non-<> operators.
+ */
+ if (is_ne)
+ *ne_clauses = lappend(*ne_clauses, pc);
+ else
+ {
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+
+ /*
+ * Since we only allow strict operators, require keys to
+ * be not null.
+ */
+ keyisnotnull[i] = true;
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) &&
+ ((Var *) arg)->varattno == partattno) ||
+ equal(arg, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ keyisnull[i] = true;
+ else
+ keyisnotnull[i] = true;
+ n_keynullness++;
+ will_compute_keys = true;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (!will_compute_keys || *constfalse)
+ return 0;
+
+ /*
+ * Try to eliminate redundant keys. In the process, we might find out
+ * that clauses are mutually contradictory and hence can never be true
+ * for any rows.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i], &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+
+ /*
+ * Now, generate the bounding tuples that can serve as equal, min, and
+ * max keys.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ int32 op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ if (op_strategy < 0 &&
+ need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ else if (op_strategy == 0)
+ {
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ }
+ else if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found the same for partition key columns.
+ * If present, we don't need minkeys and maxkeys anymore. In the case
+ * of hash partitioning, we don't require all equal keys to be operator
+ * clauses. For hash partitioning, any IS NULL clauses are considered
+ * as equal keys by the code performing actual pruning, at which time it
+ * is checked whether, along with any operator clauses, all partition key
+ * columns are covered.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ keys->keyisnull[i] = keyisnull[i];
+ keys->keyisnotnull[i] = keyisnotnull[i];
+ }
+
+ return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
+}
+
+/*
+ * Returns -1, 0, or 1 to signify that the partitioning clause has a </<=,
+ * =, and >/>= operator, respectively. Sets *incl to true if equality is
+ * implied.
+ */
+static int32
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ int32 result;
+
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = 0;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ result = -1;
+ *incl = (op->op_strategy == BTLessEqualStrategyNumber);
+ break;
+ case BTEqualStrategyNumber:
+ result = 0;
+ *incl = true;
+ break;
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ result = 1;
+ *incl = (op->op_strategy == BTGreaterEqualStrategyNumber);
+ break;
+ }
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partattoff])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partattoff], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If couldn't coerce to the partition key type, that is, the type of
+ * datums stored in PartitionBoundInfo, no hope of using this
+ * expression for anything partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+/*
+ * For a given partition key column, find the most restrictive of the clauses
+ * contained in all_clauses that are known to match the column. If in the
+ * process, it is found that two clauses are mutually contradictory, we simply
+ * stop, set *constfalse to true, and return.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey, int partattoff,
+ List *all_clauses, List **result,
+ bool *constfalse)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ hash_clause = NULL;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[partattoff],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've matched
+ * a clause and found another whose constant operand doesn't match
+ * the constant operand of the former, we have a case of mutually
+ * contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, partattoff,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value and
+ * so add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with the same. It's possible that mutual
+ * contradiction is proved at some higher level, but it's just
+ * that we couldn't do so here.
+ */
+ else
+ *result = lappend(*result, cur);
+
+ /* The code below is for btree operators, which cur is not. */
+ continue;
+ }
+
+ /*
+ * Stuff that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points to the currently best scan key of strategy
+ * type s+1; it is NULL if we haven't yet found such a key for this
+ * attr.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, replace old key. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+
+ /* The old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ *result = lappend(*result, hash_clause);
+ return;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equal key with keys of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq key is
+ * a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq key is a = 3, then because 3 < 5, we no longer need a < 5,
+ * because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the result.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ if (btree_clauses[s])
+ *result = lappend(*result, btree_clauses[s]);
+}
+
+/*
+ * Evaluate 'leftarg op rightarg' and set *result to its value.
+ *
+ * leftarg and rightarg referred to above actually refer to the constant
+ * operand (Datum) of the clause contained in the parameters leftarg and
+ * rightarg below, respectively. And op refers to the operator of the
+ * clause contained in the parameter op below.
+ *
+ * Returns true if we could actually perform the evaluation. False is
+ * returned otherwise, that is, in cases where we couldn't perform the
+ * evaluation for reasons such as operands values being unavailable or
+ * types of operands being incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partattoff];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partattoff,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partattoff,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg_const and rightarg_const are both of the type expected
+ * by op's operator, then compare them using the latter.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy %c",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor
+ * using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Hash partitioning handles puts nulls into a normal partition and
+ * doesn't require to define a special null-accpting partition.
+ * Caller didn't count nulls as a valid key; do so ourselves.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ if (keys->keyisnull[i])
+ keys->n_eqkeys++;
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keys->keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keyisnull[i])
+ {
+ int other_idx = -1;
+
+ /*
+ * Only a designated partition accepts nulls, which if there
+ * exists one, return the same.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) ||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ return bms_make_singleton(other_idx);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ /* Look up using binary search if eqkeys matches any of the datums. */
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * minoff set to -1 means all datums are greater than minkeys, which
+ * means all partitions satisfy minkeys. In that case, set minoff to
+ * the index of the leftmost datum, viz. 0.
+ *
+ * If the bound at minoff doesn't exactly match minkey or if it does,
+ * but minkey isn't inclusive, move to the bound on the right.
+ */
+ if (minoff == -1 || !is_equal || !keys->min_incl)
+ minoff++;
+
+ /*
+ * boundinfo->ndatums - 1 is the last valid list partition datums
+ * index.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ minoff = -1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * Unlike minoff, we leave maxoff that is set to -1 unchanged, because
+ * it simply means none of the partitions satisfies maxkeys.
+ *
+ * If the bound at maxoff exactly matches maxkey (is_equal), but the
+ * maxkey is not inclusive, then go to the bound on left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some datums
+ * (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ Bitmapset *result = NULL;
+
+ /*
+ * All datums between those at minoff and maxoff satisfy the query
+ * keys, so add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a ranget partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keyisnull[i])
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ /* Look up using binary search if eqkeys matches any of the datums. */
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && boundinfo->indexes[eqoff+1] >= 0)
+ {
+ /*
+ * eqoff is gives us the bound that is known to be <= eqkeys,
+ * given how partition_bound_bsearch works. The bound at eqoff+1,
+ * then, would be the upper bound of the only partition that needs
+ * to be scanned.
+ */
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * If only a prefix of the whole partition key is provided, there will
+ * be multiple partitions whose bound share the same prefix. If minkey
+ * is inclusive, we must make minoff point to the leftmost such bound,
+ * making the result contain all such partitions. If it is exclusive,
+ * we must move minoff to the right such that minoff points to the
+ * first partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in the
+ * result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, minoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is known to
+ * be <= query's minkey. The bound at minoff + 1 (if there is one),
+ * then, would be the upper bound of the leftmost partition that needs
+ * to be scanned.
+ */
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ *
+ * 1 more index than range partition datums
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, maxoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is known to
+ * be <= query's maxkey. The bound at maxoff+1, then, would be the
+ * upper bound of the rightmost partition that needs to be scanned.
+ * Although, if the bound is equal to maxkeys and the latter is not
+ * inclusive, then the bound at maxoff itself is the upper bound of
+ * the rightmost partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool include_def = false;
+ Bitmapset *result = NULL;
+
+
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper bound of
+ * an unassigned range of values, move to the adjacent bound which must
+ * be the upper bound of the leftmost or rightmost partition,
+ * respectively, that needs to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do indeed
+ * satisfy the query, but don't have a valid partition assigned. The
+ * default partition would've been included to cover those values.
+ * Although, if the original bound in question is an infinite value,
+ * there would not be any unassigned range to speak of, because the
+ * range is unbounded in that direction by definition, so no need to
+ * include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There might exist a range of values unassigned to any non-default
+ * range partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys will nulls are mapped to default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!keys->keyisnotnull[i])
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 6a2d5ad760..ce83fbcb22 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -149,8 +149,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4745,7 +4743,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2983cfa217..7a5ab45c5c 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -71,4 +71,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 0d0ba7c66a..f2fddeceb8 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -187,4 +187,7 @@ DATA(insert OID = 4082 ( 3580 pg_lsn_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index e3672218f3..1ef13a49de 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
0003-Move-some-code-of-set_append_rel_size-to-separate-fu-v16.patchtext/plain; charset=UTF-8; name=0003-Move-some-code-of-set_append_rel_size-to-separate-fu-v16.patchDownload
From 1eed277d8b57b28296f6578cf812d5cb3a9532ec Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0e8463e4a3..86e7a20da9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 674cfc6b06..daa8f516ce 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 99f65b44f2..00c134d5a3 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -299,5 +299,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0004-More-refactoring-around-partitioned-table-AppendPath-v16.patchtext/plain; charset=UTF-8; name=0004-More-refactoring-around-partitioned-table-AppendPath-v16.patchDownload
From 949c7c39d23a69a1343525b5b682f15cb51d34ff Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 133 +++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 19 +++--
src/backend/optimizer/util/relnode.c | 10 +++
src/include/nodes/relation.h | 22 +++++-
4 files changed, 128 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 86e7a20da9..83f79ea6cb 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,22 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
+ /*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel. Note
+ * that rel (the parent) might just be a union all subquery, in which
+ * case, there is nothing to do here.
+ */
+ if (IS_PARTITIONED_REL(childrel) && IS_PARTITIONED_REL(rel))
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1220,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1312,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1362,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 382791fadb..ffdf9c5247 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6189,14 +6189,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index daa8f516ce..dcfda1c3cc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +743,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1108b6a0ea..c445f401d9 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,6 +529,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +658,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
0005-Teach-planner-to-use-get_partitions_from_clauses-v16.patchtext/plain; charset=UTF-8; name=0005-Teach-planner-to-use-get_partitions_from_clauses-v16.patchDownload
From 12a1be575ad9ee5e3cc857f7cb18df32a9d1b3ba Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using the information in its
PartitionScheme. Further, if we pass the set of matched clauses to
get_partitions_from_clauses(), we get the set of matching partitions
in (hopefully) less time than determining the same by running the
matching algorithm separately for each partition.
Authors: Amit Langote, Dilip Kumar
---
src/backend/optimizer/path/allpaths.c | 406 ++++++++++++++++++++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 33 +-
src/include/nodes/relation.h | 7 +-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition_prune.out | 442 ++++++++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 74 ++++-
7 files changed, 917 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 83f79ea6cb..eeaf8fd935 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,8 +20,10 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
@@ -136,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -847,6 +857,397 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * in rel->baserestrictinfo. An empty list is returned if no matching
+ * partitions were found.
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *partclauses;
+ bool contains_const,
+ constfalse;
+ List *result = NIL;
+ int i;
+ Relation parent;
+ PartitionDesc partdesc;
+ Bitmapset *partindexes;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(root, rel,
+ list_copy(rel->baserestrictinfo),
+ &contains_const,
+ &constfalse);
+
+ /* We're done here. */
+ if (constfalse)
+ return NIL;
+
+ parent = heap_open(rte->relid, NoLock);
+ partdesc = RelationGetPartitionDesc(parent);
+
+ /*
+ * If we have matched clauses that contain at least one constant operand,
+ * then use these to prune partitions.
+ */
+ if (partclauses != NIL && contains_const)
+ partindexes = get_partitions_from_clauses(parent, rel->relid,
+ partclauses);
+ else
+ {
+ /*
+ * Else there are no clauses that are useful to prune any paritions,
+ * so we must scan all partitions.
+ */
+ partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause
+ * must be an operator clause of the form (partkey op const) or (const op
+ * partkey); the latter only if a suitable commutator exists. Furthermore,
+ * the operator must be strict and its input collation must match the partition
+ * collation. The aforementioned "const" means any expression that doesn't
+ * involve a volatile function or a Var of this relation. We allow Vars
+ * belonging to other relations (for example, if the clause is a join clause),
+ * but they are treated as parameters whose values are not known now, so cannot
+ * be used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join clauses
+ * appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ *
+ * If clauses contains at least one constant operand or a Nullness test,
+ * *contains_const is set so that the caller can pass the clauses to the
+ * partitioning module right away.
+ *
+ * If the list contains a pseudo-constant RestrictInfo with constant false
+ * value, *constfalse is set.
+ */
+static List *
+match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Node *member = lfirst(lc);
+ Expr *clause;
+ int i;
+
+ if (IsA(member, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) member;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+ else
+ clause = (Expr *) member;
+
+ /*
+ * For a BoolExpr, we should try to match each of its args with the
+ * partition key as described below for each type.
+ */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ /*
+ * For each of OR clause's args, call this function
+ * recursively with a given arg as the only member in the
+ * input list and see if it's returned as matching the
+ * partition key. Add the OR clause to the result iff at
+ * least one of its args contain a matching clause.
+ */
+ BoolExpr *orclause = (BoolExpr *) clause;
+ ListCell *lc1;
+ bool arg_matches_key = false,
+ matched_arg_contains_const = false,
+ all_args_constfalse = true;
+
+ foreach (lc1, orclause->args)
+ {
+ Node *arg = lfirst(lc1);
+ bool contains_const1,
+ constfalse1;
+
+ if (match_clauses_to_partkey(root, rel, list_make1(arg),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ arg_matches_key = true;
+ matched_arg_contains_const = contains_const1;
+ }
+
+ /* We got at least one arg that is not constant false. */
+ if (!constfalse1)
+ all_args_constfalse = false;
+ }
+
+ if (arg_matches_key)
+ {
+ result = lappend(result, clause);
+ *contains_const = matched_arg_contains_const;
+ }
+
+ /* OR clause is "constant false" if all of its args are. */
+ *constfalse = all_args_constfalse;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Since the clause is itself implicitly ANDed with other
+ * clauses in the input list, queue the args to be processed
+ * later as if they were part of the original input list.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the clauses matches the partition key and add it to
+ * the result list if other things such as operator input
+ * collation, strictness, etc. look fine.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning.
+ */
+ result = lappend(result, clause);
+
+ if (!*contains_const)
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* OK to add to the result. */
+ result = lappend(result, clause);
+ if (IsA(estimate_expression_value(root, rightop), Const))
+ *contains_const = true;
+ else
+ *contains_const = false;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* A Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Certain Boolean conditions have a special shape, which we
+ * accept if the partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ /*
+ * Only accept those for pruning that appear to be
+ * IS [NOT] TRUE/FALSE.
+ */
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+ Expr *arg = btest->arg;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN &&
+ equal((Node *) arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var))
+ {
+ if (equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ }
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +1289,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 5e03f8bc21..5bd30312cb 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1397,6 +1397,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f7438714c4..df963f701f 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1248,21 +1247,25 @@ get_relation_constraints(PlannerInfo *root,
}
/* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
+
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1920,6 +1923,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index c445f401d9..bcb669d212 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..2072766efd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..ad29f0f125 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1067,363 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..6921e39bfd 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,76 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
On 21 December 2017 at 23:38, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2017/12/20 17:27, Amit Langote wrote:
I think I was able to make this work and in the process of making it work,
also came to the conclusion that this could be made to work sensibly
*only* for list partitioned tables. That's because one cannot prune a
given partition using a set of <> operator clauses, if we cannot be sure
that those clauses exclude *all* values of the partition key allowed by
that partition. It's only possible to do that for a list partitioned
table, because by definition one is required to spell out every value that
a given partition of such table allows.
Makes sense. Thanks for fixing LIST partitioning to work with that.
We have no way to know that there's no value between 1::int and
2::int, so it's completely understandable why this can't work for
RANGE. HASH is also understandable since we don't have a complete
picture of all the values that can be contained within the partition.
I'll try to do another complete review of v16 soon.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 21 December 2017 at 23:38, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated set of patches.
Looks like the new not equals code does not properly take into account
a missing NULL partition.
create table ta (a int not null) partition by list (a);
create table ta1 partition of ta for values in(1,2);
create table ta2 partition of ta for values in(3,4);
explain select * from ta where a <> 1 and a <> 2;
ERROR: negative bitmapset member not allowed
-- Add null partition
create table ta_null partition of ta for values in(null);
explain select * from ta where a <> 1 and a <> 2; -- works now.
QUERY PLAN
-------------------------------------------------------------
Append (cost=0.00..48.25 rows=2525 width=4)
-> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
(3 rows)
This code appears to be at fault:
/*
* Also, exclude the "null-only" partition, because strict clauses in
* ne_clauses will not select any rows from it.
*/
if (count_partition_datums(relation, boundinfo->null_index) == 0)
excluded_parts = bms_add_member(excluded_parts,
boundinfo->null_index);
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Just trying to understand the code here; some very minor comments as I
go along.
partition_op_strategy returning int32 looks pretty ugly, and the calling
code is not super-intelligible either. How about returning a value from
a new enum?
typedef PartClause is missing a struct name, as is our tradition.
+ * We don't a <> operator clause into a key right away.
Missing a word there.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
David
On 2017/12/21 21:04, David Rowley wrote:
On 21 December 2017 at 23:38, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Attached updated set of patches.
Looks like the new not equals code does not properly take into account
a missing NULL partition.create table ta (a int not null) partition by list (a);
create table ta1 partition of ta for values in(1,2);
create table ta2 partition of ta for values in(3,4);
explain select * from ta where a <> 1 and a <> 2;
ERROR: negative bitmapset member not allowed-- Add null partition
create table ta_null partition of ta for values in(null);
explain select * from ta where a <> 1 and a <> 2; -- works now.
QUERY PLAN
-------------------------------------------------------------
Append (cost=0.00..48.25 rows=2525 width=4)
-> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 2))
(3 rows)This code appears to be at fault:
/*
* Also, exclude the "null-only" partition, because strict clauses in
* ne_clauses will not select any rows from it.
*/
if (count_partition_datums(relation, boundinfo->null_index) == 0)
excluded_parts = bms_add_member(excluded_parts,
boundinfo->null_index);
Oops, must check before going to count datums that a null-partition exists
at all. Will post the fixed version shortly, thanks.
Regards,
Amit
On 22 December 2017 at 13:57, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Will post the fixed version shortly, thanks.
I've made another partial pass on the patch and have a few more
things. #3 and #4 are just comments rather than requests to change
something. I think we should change those before PG11 though.
1. If I look at the elog(ERROR) messages in partition.c, there are a
number of variations of reporting an invalid partition strategy.
There seem to be 3 variations of the same thing. Probably the
"unexpected" one would suit most, but I've not looked too closely.
elog(ERROR, "invalid partitioning strategy");
elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy);
elog(ERROR, "invalid partition strategy %c",
RelationGetPartitionKey(rel)->strategy);
2. In get_relation_constraints(). Can you add a comment to why you've added:
/* Append partition predicates, if any */
if (root->parse->commandType != CMD_SELECT)
{
I guess it must be because we use the new partition pruning code for
SELECT, but not for anything else.
3. It's a shame that RelOptInfo->live_partitioned_rels is a List and
not a RelIds. I guess if you were to change that you'd need to also
change AppendPath->partitioned_rels too, so probably we can just fix
that later.
4. live_part_appinfos I think could be a Relids type too, but probably
we can change that after this patch. Append subpaths are sorted in
create_append_path() for parallel append, so the order of the subpaths
seems non-critical.
5. Small memory leaks in get_partitions_from_clauses_recurse().
if (ne_clauses)
result = bms_int_members(result,
get_partitions_from_ne_clauses(relation,
ne_clauses));
Can you assign the result of get_partitions_from_ne_clauses() and
bms_free() it after the bms_int_members() ?
Same for:
result = bms_int_members(result,
get_partitions_from_or_clause_args(relation,
rt_index,
or->args));
The reason I'm being particular about this is that for the run-time
pruning patch we'll call this from ExecReScanAppend() which will
allocate into the ExecutorState which lives as long as the query does.
So any leaks will last the entire length of the query.
ExecReScanAppend() could be called millions of billions of times, so
we need to be sure that's not going to be a problem.
6. Similar to #5, memory leaks in get_partitions_from_or_clause_args()
arg_partset = get_partitions_from_clauses_recurse(relation, rt_index,
arg_clauses);
/*
* Partition sets obtained from mutually-disjunctive clauses are
* combined using set union.
*/
result = bms_add_members(result, arg_partset);
Need to bms_free(arg_partset)
Running out of time for today, but will look again in about 4 days.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2017/12/22 1:06, Alvaro Herrera wrote:
Just trying to understand the code here; some very minor comments as I
go along.partition_op_strategy returning int32 looks pretty ugly, and the calling
code is not super-intelligible either. How about returning a value from
a new enum?
OK, I made it the following enum:
typedef enum PartOpStrategy
{
PART_OP_EQUAL,
PART_OP_LESS,
PART_OP_GREATER,
} PartOpStrategy;
typedef PartClause is missing a struct name, as is our tradition.
Will fix.
+ * We don't a <> operator clause into a key right away.
Missing a word there.
Oops, right. I meant "We don't turn a <> ...". Will fix.
Will post a new version after taking care of David's comments.
Thanks,
Amit
Hi David.
On 2017/12/22 10:35, David Rowley wrote:
On 22 December 2017 at 13:57, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Will post the fixed version shortly, thanks.
I've made another partial pass on the patch and have a few more
things. #3 and #4 are just comments rather than requests to change
something. I think we should change those before PG11 though.
Thank you.
1. If I look at the elog(ERROR) messages in partition.c, there are a
number of variations of reporting an invalid partition strategy.There seem to be 3 variations of the same thing. Probably the
"unexpected" one would suit most, but I've not looked too closely.elog(ERROR, "invalid partitioning strategy");
elog(ERROR, "unexpected partition strategy: %d", (int) key->strategy);
elog(ERROR, "invalid partition strategy %c",
RelationGetPartitionKey(rel)->strategy);
I should have used the "unexpected ..." wording in the yesterday's update.
Fixed.
2. In get_relation_constraints(). Can you add a comment to why you've added:
/* Append partition predicates, if any */
if (root->parse->commandType != CMD_SELECT)
{I guess it must be because we use the new partition pruning code for
SELECT, but not for anything else.
Yeah, I explained that a couple of times on email (maybe also in the
commit message), but not there. Done.
3. It's a shame that RelOptInfo->live_partitioned_rels is a List and
not a RelIds. I guess if you were to change that you'd need to also
change AppendPath->partitioned_rels too, so probably we can just fix
that later.
I agree.
4. live_part_appinfos I think could be a Relids type too, but probably
we can change that after this patch. Append subpaths are sorted in
create_append_path() for parallel append, so the order of the subpaths
seems non-critical.
Hmm, perhaps.
5. Small memory leaks in get_partitions_from_clauses_recurse().
if (ne_clauses)
result = bms_int_members(result,
get_partitions_from_ne_clauses(relation,
ne_clauses));Can you assign the result of get_partitions_from_ne_clauses() and
bms_free() it after the bms_int_members() ?Same for:
result = bms_int_members(result,
get_partitions_from_or_clause_args(relation,
rt_index,
or->args));The reason I'm being particular about this is that for the run-time
pruning patch we'll call this from ExecReScanAppend() which will
allocate into the ExecutorState which lives as long as the query does.
So any leaks will last the entire length of the query.
ExecReScanAppend() could be called millions of billions of times, so
we need to be sure that's not going to be a problem.
That's a very important point to stress. Thanks.
6. Similar to #5, memory leaks in get_partitions_from_or_clause_args()
arg_partset = get_partitions_from_clauses_recurse(relation, rt_index,
arg_clauses);/*
* Partition sets obtained from mutually-disjunctive clauses are
* combined using set union.
*/
result = bms_add_members(result, arg_partset);Need to bms_free(arg_partset)
Fixed all these instances of leaks.
Running out of time for today, but will look again in about 4 days.
Thanks again.
Please find attached updated patches.
Thanks,
Amit
Attachments:
0001-Some-interface-changes-for-partition_bound_-cmp-bsea-v17.patchtext/plain; charset=UTF-8; name=0001-Some-interface-changes-for-partition_bound_-cmp-bsea-v17.patchDownload
From 720fcdde53a9f374ae2d88779952a993ac4091f0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH 1/5] Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 164 ++++++++++++++++++++++++++++++----------
1 file changed, 122 insertions(+), 42 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 5c4018e9f7..dc631b2761 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -985,6 +1011,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -999,8 +1027,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1073,10 +1107,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1127,6 +1167,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1148,8 +1189,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1163,9 +1207,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2534,12 +2578,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2566,11 +2613,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2778,12 +2829,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2805,11 +2856,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2818,25 +2869,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If no tuple datum to compare with the bound, consider
+ * the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2847,12 +2928,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2866,20 +2948,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2892,8 +2973,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
0002-Introduce-a-get_partitions_from_clauses-v17.patchtext/plain; charset=UTF-8; name=0002-Introduce-a-get_partitions_from_clauses-v17.patchDownload
From 8d627b910278203151853d324c3319c265cd36c0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of query clauses and
return a set of indexes of the partitions that satisfy all of those
clauses.
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Callers must have checked that each of the clauses matches one of the
partition keys.
---
src/backend/catalog/partition.c | 2000 ++++++++++++++++++++++++++++++++++
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/optimizer/clauses.h | 2 +
5 files changed, 2009 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dc631b2761..9606ff57d0 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -163,6 +167,80 @@ typedef struct PartitionBoundCmpArg
int ndatums;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioing operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ *
+ * Equal keys are not required to be in any particular order, unlike the
+ * keys below which must appear in the same order as partition keys.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Does the query specify a key to be null or not null? Partitioning
+ * handles null partition keys specially depending on the partitioning
+ * method in use, we store this information.
+ */
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +289,35 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static Bitmapset *get_partitions_from_ne_clauses(Relation relation,
+ List *ne_clauses);
+static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
+ int rt_index, List *or_clause_args);
+static int classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses, List **ne_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partattoff, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1578,9 +1685,1902 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_from_clauses
+ * Determine the set of partitions of relation that will satisfy all
+ * the clauses contained in partclauses
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ Bitmapset *result;
+ List *partconstr = RelationGetPartitionQual(relation);
+
+ Assert(partclauses != NIL);
+
+ /*
+ * If relation is a partition itself, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partconstr)
+ {
+ PartitionBoundInfo boundinfo =
+ RelationGetPartitionDesc(relation)->boundinfo;
+
+ /*
+ * We need to worry about such a case only if the relation has a
+ * default partition to begin with.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ }
+ }
+
+ result = get_partitions_from_clauses_recurse(relation, rt_index,
+ partclauses);
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_from_clauses_recurse
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ Bitmapset *result = NULL;
+ PartScanKeyInfo keys;
+ int nkeys;
+ bool constfalse;
+ List *or_clauses,
+ *ne_clauses;
+ ListCell *lc;
+
+ /*
+ * Reduce the set of clauses into a form that get_partitions_for_keys()
+ * can work with.
+ */
+ nkeys = classify_partition_bounding_keys(relation, clauses, rt_index,
+ &keys, &constfalse,
+ &or_clauses, &ne_clauses);
+
+ /*
+ * classify_partition_bounding_keys() may have found clauses marked
+ * pseudo-constant that are false that the planner didn't or it may have
+ * itself found contradictions among clauses.
+ */
+ if (constfalse)
+ return NULL;
+
+ /*
+ * If all clauses in the list were OR clauses,
+ * classify_partition_bounding_keys() wouldn't have formed keys yet. They
+ * will be handled below by recursively calling this function for each of
+ * OR clauses' arguments and combining the resulting partition sets
+ * appropriately.
+ */
+ if (nkeys > 0)
+ result = get_partitions_for_keys(relation, &keys);
+ else
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+
+ /* No point in trying to look at other conjunctive clauses. */
+ if (bms_is_empty(result))
+ return NULL;
+
+ /*
+ * Only keep the partitions in result that are not pruned by the clauses
+ * in ne_clauses.
+ */
+ if (ne_clauses)
+ {
+ Bitmapset *ne_clause_parts;
+
+ ne_clause_parts = get_partitions_from_ne_clauses(relation, ne_clauses);
+
+ /*
+ * Clauses in ne_clauses are in conjunction with the clauses that gave
+ * us keys above and hence the partitions the partitions in result.
+ * so combine with result using set intersection.
+ */
+ result = bms_int_members(result, ne_clause_parts);
+ bms_free(ne_clause_parts);
+ }
+
+ /*
+ * Ditto, but this time or_clauses.
+ */
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_clause_args(relation, rt_index,
+ or->args);
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the other clauses above, so combine with result
+ * using set intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Assumes partkey exists in the scope and is of a list partitioned table. */
+#define partkey_datums_equal(d1, d2)\
+ (0 == DatumGetInt32(FunctionCall2Coll(&partkey->partsupfunc[0],\
+ partkey->partcollation[0],\
+ (d1), (d2))))
+/*
+ * Check if d is equal to some member of darray where equality is determined
+ * by the partitioning comparison function.
+ */
+static bool
+datum_in_array(PartitionKey partkey, Datum d, Datum *darray, int n)
+{
+ int i;
+
+ if (darray == NULL || n == 0)
+ return false;
+
+ for (i = 0; i < n; i++)
+ if (partkey_datums_equal(d, darray[i]))
+ return true;
+
+ return false;
+}
+
+/*
+ * count_partition_datums
+ *
+ * Returns the number of non-null datums allowed by a non-default list
+ * partition with given index.
+ */
+static int
+count_partition_datums(Relation rel, int index)
+{
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ int i,
+ result = 0;
+
+ Assert(index != boundinfo->default_index);
+
+ /*
+ * The answer is as many as the count of occurrence of the value index
+ * in boundinfo->indexes[].
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ if (index == boundinfo->indexes[i])
+ result += 1;
+
+ return result;
+}
+
+/*
+ * get_partitions_from_ne_clauses
+ *
+ * Return partitions of relation that satisfy all <> operator clauses in
+ * ne_clauses. Only ever called if relation is a list partitioned table.
+ */
+static Bitmapset *
+get_partitions_from_ne_clauses(Relation relation, List *ne_clauses)
+{
+ ListCell *lc;
+ Bitmapset *result,
+ *excluded_parts;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ Datum *exclude_datums;
+ int *count_excluded,
+ n_exclude_datums,
+ i;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+
+ /*
+ * How this works:
+ *
+ * For each constant expression, we look up the partition that would
+ * contain its value and mark the same as excluded partition. After
+ * doing the same for all clauses we'll have set of partitions that
+ * are excluded. For each excluded partition, check if there exist
+ * values that it allows but are not specified in the clauses, if so
+ * the partition won't actually be excluded.
+ */
+
+ /* De-duplicate constant values. */
+ exclude_datums = (Datum *) palloc0(list_length(ne_clauses) *
+ sizeof(Datum));
+ n_exclude_datums = 0;
+ foreach(lc, ne_clauses)
+ {
+ PartClause *pc = lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(partkey, 0, pc->constarg, &datum) &&
+ !datum_in_array(partkey, datum, exclude_datums, n_exclude_datums))
+ exclude_datums[n_exclude_datums++] = datum;
+ }
+
+ /*
+ * For each value, if it's found in boundinfo, increment the count of its
+ * partition as excluded due to that value.
+ */
+ count_excluded = (int *) palloc0(partdesc->nparts * sizeof(int));
+ for (i = 0; i < n_exclude_datums; i++)
+ {
+ int offset,
+ excluded_part;
+ bool is_equal;
+ PartitionBoundCmpArg arg;
+ Datum argdatums[] = {exclude_datums[i]};
+
+ memset(&arg, 0, sizeof(arg));
+ arg.datums = argdatums;
+ arg.ndatums = 1;
+ offset = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+ if (offset >= 0 && is_equal && boundinfo->indexes[offset] >= 0)
+ {
+ excluded_part = boundinfo->indexes[offset];
+ count_excluded[excluded_part]++;
+ }
+ }
+
+ excluded_parts = NULL;
+ for (i = 0; i < partdesc->nparts; i++)
+ {
+ /*
+ * If all datums of this partition appeared in ne_clauses, exclude
+ * this partition.
+ */
+ if (count_excluded[i] > 0 &&
+ count_excluded[i] == count_partition_datums(relation, i))
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Also, exclude the "null-only" partition, because strict clauses in
+ * ne_clauses will not select any rows from it.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ count_partition_datums(relation, boundinfo->null_index) == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(count_excluded);
+ pfree(exclude_datums);
+
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ result = bms_del_members(result, excluded_parts);
+ bms_free(excluded_parts);
+
+ return result;
+}
+
+/*
+ * get_partitions_from_or_clause_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_clause_args.
+ */
+static Bitmapset *
+get_partitions_from_or_clause_args(Relation relation, int rt_index,
+ List *or_clause_args)
+{
+ ListCell *lc;
+ Bitmapset *result = NULL;
+
+ foreach(lc, or_clause_args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation
+ * due to the latter's partition constraint, which means we must
+ * not add its partitions to or_partset. But the clause may not
+ * contain this relation's partition key expressions (instead the
+ * parent's), so we could not depend on just calling
+ * get_partitions_from_clauses_recurse(relation, ...) to determine
+ * that the clause indeed prunes all of the relation's partition.
+ *
+ * Use predicate refutation proof instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation, rt_index,
+ arg_clauses);
+
+ /*
+ * Partition sets obtained from mutually-disjunctive clauses are
+ * combined using set union.
+ */
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ?\
+ (IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) :\
+ equal((expr), (partexpr)))
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, and max keys, along with
+ * any Nullness constraints and return that information in the output
+ * argument keys (number of keys is the return value)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max
+ * bounds. For example, of a > 1, a > 2, and a >= 5, "5" is the best min
+ * bound for the column a, which also happens to be an inclusive bound.
+ * When analyzing multiple clauses referencing the same key, it is checked
+ * if there are mutually contradictory clauses and if so, we set *constfalse
+ * to true to indicate to the caller that the set of clauses cannot be true
+ * for any partition. It is also set if the list already contains a
+ * pseudo-constant clause.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by clauses containing equality operator, unless hash
+ * partitioning is in use, in which case, it's possible that some keys have
+ * IS NULL clauses while remaining have clauses with equality operator.
+ * Min and max bounds could contain bound values for only a prefix of keys.
+ *
+ * All the OR clauses encountered in the list and those generated from certain
+ * ScalarArrayOpExprs are added to *or_clauses. It's the responsibility of the
+ * caller to process the argument clauses of each of the OR clauses, which
+ * would involve recursively calling this function.
+ *
+ * Clauses containing a <> operator are added to *ne_clauses, provided its
+ * negator is a valid partitioning equality operator and that too only if
+ * list partitioning is in use.
+ */
+static int
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses,
+ List **ne_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool will_compute_keys = false;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ bool keyisnotnull[PARTITION_MAX_KEYS];
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *ne_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, sizeof(keyclauses_all));
+ /* false means we don't know if a given key is null */
+ memset(keyisnull, false, sizeof(keyisnull));
+ /* false means we don't know if a given key is not null */
+ memset(keyisnotnull, false, sizeof(keyisnull));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ /*
+ * A non-zero partattno refers to a simple column reference that
+ * will be matched against varattno of a Var appearing the clause.
+ * partattno == 0 refers to arbitrary expressions, which get the
+ * current one from PartitionKey.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ /* Copy to avoid overwriting the relcache's content. */
+ partexpr = copyObject(lfirst(partexprs_item));
+
+ /*
+ * Expressions stored in PartitionKey in the relcache all
+ * contain a dummy varno (that is, 1), but we must switch to
+ * the RT index of the table in this query so that it can be
+ * correctly matched to the expressions coming from the query.
+ */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+ bool is_ne = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ constexpr = leftop;
+ else
+ /* Clause not meant for this column. */
+ continue;
+
+ /*
+ * Handle some cases wherein the clause's operator may not
+ * belong to the partitioning operator family. For example,
+ * operators named '<>' are not listed in any operator
+ * family whatsoever. Also, ordering opertors like '<' are
+ * not listed in the hash operator family.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ int strategy;
+ Oid negator,
+ lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is '<>', check if its
+ * negator is an equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber &&
+ partkey->strategy == PARTITION_STRATEGY_LIST)
+ is_ne = true;
+ }
+
+ /*
+ * We're not going turn this into a key as it is, either
+ * because this is an ordering op and hash partitioning is
+ * in use or we found a <> operator useful for pruning
+ * that will be handed over to the caller without turning
+ * it into a key. So, move on.
+ */
+ if (!is_ne)
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * Flip the left and right args if we have to, because the
+ * code which extract the constant value to use for
+ * partition-pruning expects to find it as the rightop of the
+ * clause. (See below in this function.)
+ */
+ if (constexpr == rightop)
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+ Oid commutator = get_commutator(opclause->opno);
+
+ /*
+ * Caller must have made sure to check that the commutator
+ * indeed exists.
+ */
+ Assert(OidIsValid(commutator));
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = commutator;
+ commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_from_ne_clauses().
+ */
+ if (is_ne)
+ *ne_clauses = lappend(*ne_clauses, pc);
+ else
+ {
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+
+ /*
+ * Since we only allow strict operators, require keys to
+ * be not null.
+ */
+ keyisnotnull[i] = true;
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = list_copy(arrexpr->elements);
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if ((IsA(arg, Var) &&
+ ((Var *) arg)->varattno == partattno) ||
+ equal(arg, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ keyisnull[i] = true;
+ else
+ keyisnotnull[i] = true;
+ n_keynullness++;
+ will_compute_keys = true;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (!will_compute_keys || *constfalse)
+ return 0;
+
+ /*
+ * Try to eliminate redundant keys. In the process, we might find out
+ * that clauses are mutually contradictory and hence can never be true
+ * for any rows.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i], &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+
+ /*
+ * Generate bounding tuple(s).
+ *
+ * We look up partitions in the partition bound descriptor using, say,
+ * partition_bound_bsearch(), which expects a Datum (or Datums if multi-
+ * column key). So, extract the same out of the constant argument of
+ * each clause.
+ *
+ * Further, based on the strategies of clause operators (=, </<=, >/>=),
+ * try to construct tuples out of those datums that serve as the exact
+ * look-up tuple or minimum/maximum bounding tuple(s). If we find datums
+ * for all partition key columns that appear in = operator clauses, then
+ * we have the look-up tuple to be exactly matched, which will return just
+ * one partition if one exists. If the last value of the tuple comes from
+ * a </<= or >/>= operator, then that constitutes the minimum and maximum
+ * bounding tuple, respectively. There is one exception -- if the tuple
+ * constitutes a proper prefix of partition key columns, with none of its
+ * values coming from a </<= or >/>= operator, we consider such tuple both
+ * the minimum and maximum bounding tuple. For a multi-column range
+ * partitioned table, there usually exists a sequence of consecutive
+ * partitions that share a prefix of partition bound, which are all
+ * matched by a bounding tuple of the aforementioned shape.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found the same for partition key columns.
+ * If present, we don't need minkeys and maxkeys anymore. In the case
+ * of hash partitioning, we don't require all equal keys to be operator
+ * clauses. For hash partitioning, any IS NULL clauses are considered
+ * as equal keys by the code performing actual pruning, at which time it
+ * is checked whether, along with any operator clauses, all partition key
+ * columns are covered.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ keys->keyisnull[i] = keyisnull[i];
+ keys->keyisnotnull[i] = keyisnotnull[i];
+ }
+
+ return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
+}
+
+/*
+ * Returns -1, 0, or 1 to signify that the partitioning clause has a </<=,
+ * =, and >/>= operator, respectively. Sets *incl to true if equality is
+ * implied.
+ */
+static PartOpStrategy
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ PartOpStrategy result;
+
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = PART_OP_EQUAL;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ result = PART_OP_LESS;
+ *incl = (op->op_strategy == BTLessEqualStrategyNumber);
+ break;
+ case BTEqualStrategyNumber:
+ result = PART_OP_EQUAL;
+ *incl = true;
+ break;
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ result = PART_OP_GREATER;
+ *incl = (op->op_strategy == BTGreaterEqualStrategyNumber);
+ break;
+ }
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partattoff,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partattoff])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partattoff], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If couldn't coerce to the partition key type, that is, the type of
+ * datums stored in PartitionBoundInfo, no hope of using this
+ * expression for anything partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ return false;
+ }
+
+ Assert(false); /* don't ever get here */
+ return false;
+}
+
+/*
+ * For a given partition key column, find the most restrictive of the clauses
+ * contained in all_clauses that are known to match the column. If in the
+ * process, it is found that two clauses are mutually contradictory, we simply
+ * stop, set *constfalse to true, and return.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey, int partattoff,
+ List *all_clauses, List **result,
+ bool *constfalse)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ hash_clause = NULL;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[partattoff],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've matched
+ * a clause and found another whose constant operand doesn't match
+ * the constant operand of the former, we have a case of mutually
+ * contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, partattoff,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value and
+ * so add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with the same. It's possible that mutual
+ * contradiction is proved at some higher level, but it's just
+ * that we couldn't do so here.
+ */
+ else
+ *result = lappend(*result, cur);
+
+ /* The code below is for btree operators, which cur is not. */
+ continue;
+ }
+
+ /*
+ * Stuff that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points to the currently best scan key of strategy
+ * type s+1; it is NULL if we haven't yet found such a key for this
+ * attr.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, replace old key. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+
+ /* The old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ *result = lappend(*result, hash_clause);
+ return;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equal key with keys of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq key is
+ * a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq key is a = 3, then because 3 < 5, we no longer need a < 5,
+ * because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, partattoff,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partattoff,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the result.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ if (btree_clauses[s])
+ *result = lappend(*result, btree_clauses[s]);
+}
+
+/*
+ * Evaluate 'leftarg op rightarg' and set *result to its value.
+ *
+ * leftarg and rightarg referred to above actually refer to the constant
+ * operand (Datum) of the clause contained in the parameters leftarg and
+ * rightarg below, respectively. And op refers to the operator of the
+ * clause contained in the parameter op below.
+ *
+ * Returns true if we could actually perform the evaluation. False is
+ * returned otherwise, that is, in cases where we couldn't perform the
+ * evaluation for reasons such as operands values being unavailable or
+ * types of operands being incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partattoff,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partattoff];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partattoff,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partattoff,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg_const and rightarg_const are both of the type expected
+ * by op's operator, then compare them using the latter.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor
+ * using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Hash partitioning handles puts nulls into a normal partition and
+ * doesn't require to define a special null-accpting partition.
+ * Caller didn't count nulls as a valid key; do so ourselves.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ if (keys->keyisnull[i])
+ keys->n_eqkeys++;
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keys->keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keyisnull[i])
+ {
+ int other_idx = -1;
+
+ /*
+ * Only a designated partition accepts nulls, which if there
+ * exists one, return the same.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) ||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ return bms_make_singleton(other_idx);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ /* Look up using binary search if eqkeys matches any of the datums. */
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * minoff set to -1 means all datums are greater than minkeys, which
+ * means all partitions satisfy minkeys. In that case, set minoff to
+ * the index of the leftmost datum, viz. 0.
+ *
+ * If the bound at minoff doesn't exactly match minkey or if it does,
+ * but minkey isn't inclusive, move to the bound on the right.
+ */
+ if (minoff == -1 || !is_equal || !keys->min_incl)
+ minoff++;
+
+ /*
+ * boundinfo->ndatums - 1 is the last valid list partition datums
+ * index.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ minoff = -1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * Unlike minoff, we leave maxoff that is set to -1 unchanged, because
+ * it simply means none of the partitions satisfies maxkeys.
+ *
+ * If the bound at maxoff exactly matches maxkey (is_equal), but the
+ * maxkey is not inclusive, then go to the bound on left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some datums
+ * (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ Bitmapset *result = NULL;
+
+ /*
+ * All datums between those at minoff and maxoff satisfy the query
+ * keys, so add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a ranget partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (keys->keyisnull[i])
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ /* Look up using binary search if eqkeys matches any of the datums. */
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && boundinfo->indexes[eqoff+1] >= 0)
+ {
+ /*
+ * eqoff is gives us the bound that is known to be <= eqkeys,
+ * given how partition_bound_bsearch works. The bound at eqoff+1,
+ * then, would be the upper bound of the only partition that needs
+ * to be scanned.
+ */
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * If only a prefix of the whole partition key is provided, there will
+ * be multiple partitions whose bound share the same prefix. If minkey
+ * is inclusive, we must make minoff point to the leftmost such bound,
+ * making the result contain all such partitions. If it is exclusive,
+ * we must move minoff to the right such that minoff points to the
+ * first partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in the
+ * result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, minoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is known to
+ * be <= query's minkey. The bound at minoff + 1 (if there is one),
+ * then, would be the upper bound of the leftmost partition that needs
+ * to be scanned.
+ */
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ *
+ * 1 more index than range partition datums
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, maxoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is known to
+ * be <= query's maxkey. The bound at maxoff+1, then, would be the
+ * upper bound of the rightmost partition that needs to be scanned.
+ * Although, if the bound is equal to maxkeys and the latter is not
+ * inclusive, then the bound at maxoff itself is the upper bound of
+ * the rightmost partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool include_def = false;
+ Bitmapset *result = NULL;
+
+
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper bound of
+ * an unassigned range of values, move to the adjacent bound which must
+ * be the upper bound of the leftmost or rightmost partition,
+ * respectively, that needs to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do indeed
+ * satisfy the query, but don't have a valid partition assigned. The
+ * default partition would've been included to cover those values.
+ * Although, if the original bound in question is an infinite value,
+ * there would not be any unassigned range to speak of, because the
+ * range is unbounded in that direction by definition, so no need to
+ * include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There might exist a range of values unassigned to any non-default
+ * range partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys will nulls are mapped to default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!keys->keyisnotnull[i])
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 9ca384db51..93eb374343 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -149,8 +149,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4758,7 +4756,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2983cfa217..7a5ab45c5c 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -71,4 +71,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 0d0ba7c66a..f2fddeceb8 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -187,4 +187,7 @@ DATA(insert OID = 4082 ( 3580 pg_lsn_minmax_ops PGNSP PGUID ));
DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index e3672218f3..1ef13a49de 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
0003-Move-some-code-of-set_append_rel_size-to-separate-fu-v17.patchtext/plain; charset=UTF-8; name=0003-Move-some-code-of-set_append_rel_size-to-separate-fu-v17.patchDownload
From 597a7e5d98715c5133491dfe0890e73547e69d35 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0e8463e4a3..86e7a20da9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 674cfc6b06..daa8f516ce 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 3ef12b323b..f183aacfb8 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -300,5 +300,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
0004-More-refactoring-around-partitioned-table-AppendPath-v17.patchtext/plain; charset=UTF-8; name=0004-More-refactoring-around-partitioned-table-AppendPath-v17.patchDownload
From abcc84af9bed7003b39403915ccf42979390014e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 133 +++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 19 +++--
src/backend/optimizer/util/relnode.c | 10 +++
src/include/nodes/relation.h | 22 +++++-
4 files changed, 128 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 86e7a20da9..83f79ea6cb 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,22 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
+ /*
+ * If childrel is itself partitioned, add it and its partitioned
+ * children to the list being propagated up to the root rel. Note
+ * that rel (the parent) might just be a union all subquery, in which
+ * case, there is nothing to do here.
+ */
+ if (IS_PARTITIONED_REL(childrel) && IS_PARTITIONED_REL(rel))
+ {
+ rel->live_partitioned_rels =
+ list_concat(rel->live_partitioned_rels,
+ list_copy(childrel->live_partitioned_rels));
+ }
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1220,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1312,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1362,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ childrel->live_partitioned_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 382791fadb..ffdf9c5247 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6189,14 +6189,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index daa8f516ce..dcfda1c3cc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +743,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 3b9d303ce4..83bcdba29b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,6 +529,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +658,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
0005-Teach-planner-to-use-get_partitions_from_clauses-v17.patchtext/plain; charset=UTF-8; name=0005-Teach-planner-to-use-get_partitions_from_clauses-v17.patchDownload
From a20165dda5dcd5173dbb731a746b2f0ca9673fb6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using the information in its
PartitionScheme. Further, if we pass the set of matched clauses to
get_partitions_from_clauses(), we get the set of matching partitions
in (hopefully) less time than determining the same by running the
matching algorithm separately for each partition.
Authors: Amit Langote, Dilip Kumar
---
src/backend/optimizer/path/allpaths.c | 406 ++++++++++++++++++++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 41 ++-
src/include/nodes/relation.h | 7 +-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition_prune.out | 442 ++++++++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 74 ++++-
7 files changed, 924 insertions(+), 78 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 83f79ea6cb..eeaf8fd935 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,8 +20,10 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
@@ -136,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -847,6 +857,397 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * in rel->baserestrictinfo. An empty list is returned if no matching
+ * partitions were found.
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *partclauses;
+ bool contains_const,
+ constfalse;
+ List *result = NIL;
+ int i;
+ Relation parent;
+ PartitionDesc partdesc;
+ Bitmapset *partindexes;
+
+ /*
+ * Get the clauses that match the partition key, including information
+ * about any nullness tests against partition keys. Set keynullness to
+ * a invalid value of NullTestType, which 0 is not.
+ */
+ partclauses = match_clauses_to_partkey(root, rel,
+ list_copy(rel->baserestrictinfo),
+ &contains_const,
+ &constfalse);
+
+ /* We're done here. */
+ if (constfalse)
+ return NIL;
+
+ parent = heap_open(rte->relid, NoLock);
+ partdesc = RelationGetPartitionDesc(parent);
+
+ /*
+ * If we have matched clauses that contain at least one constant operand,
+ * then use these to prune partitions.
+ */
+ if (partclauses != NIL && contains_const)
+ partindexes = get_partitions_from_clauses(parent, rel->relid,
+ partclauses);
+ else
+ {
+ /*
+ * Else there are no clauses that are useful to prune any paritions,
+ * so we must scan all partitions.
+ */
+ partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * For an individual clause to match with a partition key column, the clause
+ * must be an operator clause of the form (partkey op const) or (const op
+ * partkey); the latter only if a suitable commutator exists. Furthermore,
+ * the operator must be strict and its input collation must match the partition
+ * collation. The aforementioned "const" means any expression that doesn't
+ * involve a volatile function or a Var of this relation. We allow Vars
+ * belonging to other relations (for example, if the clause is a join clause),
+ * but they are treated as parameters whose values are not known now, so cannot
+ * be used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join clauses
+ * appropriately.
+ *
+ * If a NullTest against a partition key is encountered, it's added to the
+ * result as well.
+ *
+ * If clauses contains at least one constant operand or a Nullness test,
+ * *contains_const is set so that the caller can pass the clauses to the
+ * partitioning module right away.
+ *
+ * If the list contains a pseudo-constant RestrictInfo with constant false
+ * value, *constfalse is set.
+ */
+static List *
+match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ foreach(lc, clauses)
+ {
+ Node *member = lfirst(lc);
+ Expr *clause;
+ int i;
+
+ if (IsA(member, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) member;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+ else
+ clause = (Expr *) member;
+
+ /*
+ * For a BoolExpr, we should try to match each of its args with the
+ * partition key as described below for each type.
+ */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ /*
+ * For each of OR clause's args, call this function
+ * recursively with a given arg as the only member in the
+ * input list and see if it's returned as matching the
+ * partition key. Add the OR clause to the result iff at
+ * least one of its args contain a matching clause.
+ */
+ BoolExpr *orclause = (BoolExpr *) clause;
+ ListCell *lc1;
+ bool arg_matches_key = false,
+ matched_arg_contains_const = false,
+ all_args_constfalse = true;
+
+ foreach (lc1, orclause->args)
+ {
+ Node *arg = lfirst(lc1);
+ bool contains_const1,
+ constfalse1;
+
+ if (match_clauses_to_partkey(root, rel, list_make1(arg),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ arg_matches_key = true;
+ matched_arg_contains_const = contains_const1;
+ }
+
+ /* We got at least one arg that is not constant false. */
+ if (!constfalse1)
+ all_args_constfalse = false;
+ }
+
+ if (arg_matches_key)
+ {
+ result = lappend(result, clause);
+ *contains_const = matched_arg_contains_const;
+ }
+
+ /* OR clause is "constant false" if all of its args are. */
+ *constfalse = all_args_constfalse;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Since the clause is itself implicitly ANDed with other
+ * clauses in the input list, queue the args to be processed
+ * later as if they were part of the original input list.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the clauses matches the partition key and add it to
+ * the result list if other things such as operator input
+ * collation, strictness, etc. look fine.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids,
+ left_relids,
+ right_relids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+ left_relids = pull_varnos((Node *) leftop);
+ right_relids = pull_varnos((Node *) rightop);
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = right_relids;
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = left_relids;
+ expr_op = get_commutator(expr_op);
+
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning.
+ */
+ result = lappend(result, clause);
+
+ if (!*contains_const)
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* OK to add to the result. */
+ result = lappend(result, clause);
+ if (IsA(estimate_expression_value(root, rightop), Const))
+ *contains_const = true;
+ else
+ *contains_const = false;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* A Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Certain Boolean conditions have a special shape, which we
+ * accept if the partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ /*
+ * Only accept those for pruning that appear to be
+ * IS [NOT] TRUE/FALSE.
+ */
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+ Expr *arg = btest->arg;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN &&
+ equal((Node *) arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var))
+ {
+ if (equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ }
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +1289,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 5e03f8bc21..5bd30312cb 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1397,6 +1397,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f7438714c4..f2ab2de079 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1247,22 +1246,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1920,6 +1929,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 83bcdba29b..9f0b6575ca 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..2072766efd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..ad29f0f125 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1067,363 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..6921e39bfd 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,76 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+/* partial keys won't prune, nor would non-equality conditions */
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+/* pruning should work in all cases below */
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
On 22 December 2017 at 17:25, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Please find attached updated patches.
Hi Amit,
I've just completed a pass over the v17 patch set. I've found a number
of things that need to be addressed. Some might seem a bit nit-picky,
sorry about that. However, many of the others genuinely need to be
addressed.
1. The following code calls RelationGetPartitionQual, but the output
of that is only needed when partition_bound_has_default(boundinfo) is
true.
Can you change this to only get the RelationGetPartitionQual when it's required?
Bitmapset *
get_partitions_from_clauses(Relation relation, int rt_index,
ParamListInfo prmlist, ExprContext *econtext,
List *partclauses)
{
Bitmapset *result;
List *partconstr = RelationGetPartitionQual(relation);
2. The header comment in match_clauses_to_partkey() does not give any
warning that 'clauses' is modified within the function.
The function should create a copy of the clauses before modifying
them. This will save you having to do any list_copy calls when you're
calling the function.
The header comment is also not very clear about what the return value
of the function is.
3. "method" I think should be "strategy". We've pretty much
standardised on that term everywhere else, so let's keep to standard.
/*
* Does the query specify a key to be null or not null? Partitioning
* handles null partition keys specially depending on the partitioning
* method in use, we store this information.
*/
4. "relation" should be in single quotes, since you're talking about
the parameter named "relation". Likewise with "partclauses", otherwise
it just seems like bad English.
* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
5. partdesc's assignment can be delayed until it's needed. This will
save generating it when constfalse == true
static Bitmapset *
get_partitions_from_clauses_recurse(Relation relation, int rt_index,
List *clauses)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
6. In the following comment, I'd have expected almost identical code,
just with some other List, but the code probably differs a bit too
much to use "Ditto".
/*
* Ditto, but this time or_clauses.
*/
7. Comment claims we use "set union", but we're really just collecting
the members from the other set in:
/*
* Partition sets obtained from mutually-disjunctive clauses are
* combined using set union.
*/
result = bms_add_members(result, arg_partset);
8. These arrays could just be initialized up to partkey->partnatts.
I'd imagine most of the time this will be just 1, so would save
needlessly setting the 31 other elements, although, perhaps it's a bad
idea to optimize this.
memset(keyclauses_all, 0, sizeof(keyclauses_all));
/* false means we don't know if a given key is null */
memset(keyisnull, false, sizeof(keyisnull));
/* false means we don't know if a given key is not null */
memset(keyisnotnull, false, sizeof(keyisnull));
The last two of these could just be Bitmapsets and you'd not need any
memset at all. PARTITION_MAX_KEYS just so happens to be the same as
BITS_PER_BITMAPWORD in a standard build, so you'd always be able to
mark everything with a single bitmap word. This would help a bit in
the various places that you're counting the true elements, for
example, the following code:
for (i = 0; i < partkey->partnatts; i++)
{
if (!keys->keyisnotnull[i])
{
include_def = true;
break;
}
}
could become:
include_def = !bms_is_empty(keys->keyisnotnull);
if you converted all these to Bitmapsets.
9. The following comment would be better read as: /* clause does not
match this partition key */
/* Clause not meant for this column. */
10. The following comment talks about handling less than operators for
hash opfamilies, but the code only handles <> for btree and list
partitioning.
* Handle some cases wherein the clause's operator may not
* belong to the partitioning operator family. For example,
* operators named '<>' are not listed in any operator
* family whatsoever. Also, ordering opertors like '<' are
* not listed in the hash operator family.
"opertors" should be spelled "operators"
11. In the following comment "operator" should be "operators":
* are constrained by clauses containing equality operator, unless hash
Likewise for:
* IS NULL clauses while remaining have clauses with equality operator.
12. The following code in classify_partition_bounding_keys probably
should use EXPR_MATCHES_PARTKEY.
/* Does leftop match with this partition key column? */
if ((IsA(arg, Var) &&
((Var *) arg)->varattno == partattno) ||
equal(arg, partexpr))
13. The comment for classify_partition_bounding_keys does not seem to
define well what the meaning of the return value is:
* classify_partition_bounding_keys
* Classify partition clauses into equal, min, and max keys, along with
* any Nullness constraints and return that information in the output
* argument keys (number of keys is the return value)
I had thought all along that this must mean the number of distinct
partition keys that we've found useful clauses for, but it's possible
to fool this with duplicate IS NULL checks.
create table hp (a int, b text) partition by hash (a, b);
create table hp0 partition of hp for values with (modulus 4, remainder 0);
create table hp3 partition of hp for values with (modulus 4, remainder 3);
create table hp1 partition of hp for values with (modulus 4, remainder 1);
create table hp2 partition of hp for values with (modulus 4, remainder 2);
explain select * from hp where a is null and a is null;
This causes classify_partition_bounding_keys to return 2. I can't see
any bad side effects of this, but I think the comment needs to be
improved. Perhaps it's better just to make the function return bool,
returning true if there are any useful keys?
14. The switch statement in partition_op_strategy could be simplified
a bit and be written more as:
case BTLessEqualStrategyNumber:
*incl = true;
/* fall through */
case BTLessStrategyNumber:
result = PART_OP_LESS;
break;
15. No default case statements in partition_op_strategy() could result
in "result" not being set.
16. I'm unsure what the following comment means:
* To set eqkeys, we must have found the same for partition key columns.
The word "for" seems wrong, or I'm not sure what it wants to ensure it
finds the same for.
17. Header comment for partition_op_strategy is out-of-date.
/*
* Returns -1, 0, or 1 to signify that the partitioning clause has a </<=,
* =, and >/>= operator, respectively. Sets *incl to true if equality is
* implied.
*/
It'll never return -1.
18. The following comment has some grammar issues:
/*
* If couldn't coerce to the partition key type, that is, the type of
* datums stored in PartitionBoundInfo, no hope of using this
* expression for anything partitioning-related.
*/
Would be better with:
/*
* If we couldn't coerce the partition key type, that is, the type
* of datums stored in PartitionBoundInfo, then there's no hope of
* using this expression for anything partitioning-related.
*/
19. In partkey_datum_from_expr() the Assert(false) at the end seems
overkill. You could just get rid of the default in the switch
statement and have it fall through to the final return. This would
save 4 lines of code.
20. The word "and" I think needs removed from the following comment:
* Couldn't compare; keep hash_clause set to the previous value and
* so add this one directly to the result. Caller would
Probably also needs a comment after "value"
21. The following comment seems to indicate 'cur' is an operator, but
it's a PartClause:
/* The code below is for btree operators, which cur is not. */
It might be better to write
/* The code below handles Btree operators which are not relevant to a
hash-partitioned table. */
22. "Stuff" should be "The code" in:
* Stuff that follows closely mimics similar processing done by
23. In remove_redundant_clauses() and various other places, you have a
variable named partattoff. What's the meaning of this name? I'd
imagine it is short for "partition attribute offset", but it's a
"partition key index", so why not "partkeyidx"?
Of course, you might claim that an array index is just an offset, but
it is a little confusing if you think of attribute offsets in a
TupleDesc.
24. get_partitions_for_keys() you're using "return" and "break" in the
switch statement. You can remove the breaks;
25. The Assert(false) in get_partitions_for_keys seems overkill. I
think it's fine to just do:
return NULL; /* keep compiler quiet */
26. The following code comment in get_partitions_for_keys_hash() does
not seem very well written:
* Hash partitioning handles puts nulls into a normal partition and
* doesn't require to define a special null-accpting partition.
* Caller didn't count nulls as a valid key; do so ourselves.
Maybe "puts" should be "storing"?
Also, does hash partitioning actually support a NULL partition? This
seems to say it doesn't require, but as far as I can see it does not
*support* a NULL partition. The comment is a bit misleading.
"accpting" should be "accepting".
27. In get_partitions_for_keys_hash() why is it possible to get a
result_index below 0? In that case, the code will end up triggering
the Assert(false), but if you want to Assert something here then maybe
it should be Assert(result_index >= 0)?
if (result_index >= 0)
return bms_make_singleton(result_index);
28. Is there any point in the following loop in get_partitions_for_keys_list()?
/*
* We might be able to get the answer sooner based on the nullness of
* keys, so get that out of the way.
*/
for (i = 0; i < partkey->partnatts; i++)
{
Won't partkey->partnatts always be 1? Maybe you can Assert that
instead, just in case someone forgets to change the code if LIST is to
support multiple partition keys.
29. In get_partitions_for_keys_list and get_partitions_for_keys_range
partdesc is only used for an Assert. Maybe you can just:
Assert(RelationGetPartitionDesc(rel)->nparts > 0);
30. The Assert(false) in get_partitions_for_keys_list() looks
suspiciously easy to hit...
create table lp (a int not null) partition by list(a);
create table lp1 partition of lp for values in(1);
explain select * from lp where a > 11; -- Assert fail!
31. ranget?
* get_partitions_for_keys_range
* Return partitions of a ranget partitioned table for requested keys
32. I'm not quite sure what this comment is saying:
/*
* eqoff is gives us the bound that is known to be <= eqkeys,
* given how partition_bound_bsearch works. The bound at eqoff+1,
* then, would be the upper bound of the only partition that needs
* to be scanned.
*/
33. "one" -> "the one"
* Find the leftmost bound that satisfies the query, i.e., one that
* satisfies minkeys.
34. Why "will" there be multiple partitions sharing the same prefix?
* If only a prefix of the whole partition key is provided, there will
* be multiple partitions whose bound share the same prefix. If minkey
Perhaps "will" should be "may"?
35. I think "would've" in the following comment is not correct. This
seems to indicate that this has already happened, which it has not.
"will have to be" might be more correct?
* satisfy the query, but don't have a valid partition assigned. The
* default partition would've been included to cover those values.
36. "will" -> "with"?
* Since partition keys will nulls are mapped to default range
37. "If no" -> "If there's no"
* If no tuple datum to compare with the bound, consider
* the latter to be greater.
38. I don't see anything about setting keynullness here:
/*
* Get the clauses that match the partition key, including information
* about any nullness tests against partition keys. Set keynullness to
* a invalid value of NullTestType, which 0 is not.
*/
partclauses = match_clauses_to_partkey(root, rel,
list_copy(rel->baserestrictinfo),
&contains_const,
&constfalse);
39. "paritions" -> "partitions"
* Else there are no clauses that are useful to prune any paritions,
40. In match_clauses_to_partkey, the following code pulls the varnos
from each operand of the expression. It would be better to just pull
the side that's needed (if any) a bit later rather than always doing
both.
left_relids = pull_varnos((Node *) leftop);
right_relids = pull_varnos((Node *) rightop);
41. I think a list_copy is missing here:
/*
* Accumulate the live partitioned children of this child, if it's
* itself partitioned rel.
*/
if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
childrel->live_partitioned_rels);
Surely if we don't perform a list_copy of
childrel->live_partitioned_rels then subsequent additions to the
resulting list will inadvertently add new items to the
childrel->live_partitioned_rels?
42. Is this because of an existing bug?
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted
where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
43. In partition_prune.sql you have a mix of /* */ and -- comments.
Please just use --
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 28 December 2017 at 15:07, David Rowley <david.rowley@2ndquadrant.com> wrote:
43. In partition_prune.sql you have a mix of /* */ and -- comments.
Please just use --
Just a few extras that I found:
44. In match_clauses_to_partkey you're making use of
estimate_expression_value(), I don't think this is safe.
if (IsA(estimate_expression_value(root, rightop), Const))
*contains_const = true;
The only other places I see using this in the planner are for costing
purposes. Also, the header comment for that function says it's not
safe. Particularly "This effectively means that we plan using the
first supplied value of the Param.". If that's the case, then if we're
planning a generic plan, then wouldn't it be possible that the planner
chooses the current supplied parameter value and prune away partitions
based on that value. That would make the plan invalid for any other
parameter, but it's meant to be a generic plan, so we can't do that.
45. Why use a list_copy() here?
/*
* For a nested ArrayExpr, we don't know how to get the
* actual scalar values out into a flat list, so we give
* up doing anything with this ScalarArrayOpExpr.
*/
if (arrexpr->multidims)
continue;
elem_exprs = list_copy(arrexpr->elements);
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
I happened to notice that Ashutosh's patch series at
/messages/by-id/CAFjFpReJhFSoy6DqH0ipFSHd=sLNEkSzAtz4VWCaS-w2jZL=uw@mail.gmail.com
has a 0001 patch that modifies the partition_bound_cmp stuff too.
Are those conflicting?
Ashutosh's commit message:
Modify bound comparision functions to accept members of PartitionKey
Functions partition_bound_cmp(), partition_rbound_cmp() and
partition_rbound_datum_cmp() are required to merge partition bounds
from joining relations. While doing so, we do not have access to the
PartitionKey of either relations. So, modify these functions to accept
only required members of PartitionKey so that the functions can be
reused for merging bounds.
Amit's:
Some interface changes for partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Dec 29, 2017 at 6:32 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I happened to notice that Ashutosh's patch series at
/messages/by-id/CAFjFpReJhFSoy6DqH0ipFSHd=sLNEkSzAtz4VWCaS-w2jZL=uw@mail.gmail.com
has a 0001 patch that modifies the partition_bound_cmp stuff too.
Are those conflicting?Ashutosh's commit message:
Modify bound comparision functions to accept members of PartitionKeyFunctions partition_bound_cmp(), partition_rbound_cmp() and
partition_rbound_datum_cmp() are required to merge partition bounds
from joining relations. While doing so, we do not have access to the
PartitionKey of either relations. So, modify these functions to accept
only required members of PartitionKey so that the functions can be
reused for merging bounds.Amit's:
Some interface changes for partition_bound_{cmp/bsearch}Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
I haven't looked at Amit's changes, but we need a more flexible way to
pass information required for datum comparison than using
PartitionKey, since that's not available in the optimizer and can not
be associated with join, aggregate relations. If we pass that
information through a structure, there are two ways
1. it will need to be part of PartitionScheme; I am not sure if we can
have a substructure in PartitionKey. But if we can do it that way, we
can pass that structure to the functions.
2. we will need to construct the structure filling it with comparison
information and pass it to the comparison functions. I think what we
achieve out of this isn't worth the code we will need to add.
I would prefer first approach over the other.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From 8d627b910278203151853d324c3319c265cd36c0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 2/5] Introduce a get_partitions_from_clauses()
This one fails to apply. Please rebase.
Did you know you can use "git format-patch -v6" to generate
appropriately named patch files without having to rename them yourself?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Amit,
On 12/21/2017 11:25 PM, Amit Langote wrote:
Thanks again.
Please find attached updated patches.
I have been looking at this patch from a simple hash partition point of
view.
-- ddl.sql --
CREATE TABLE t1 (
a integer NOT NULL,
b integer NOT NULL
) PARTITION BY HASH (b);
CREATE TABLE t1_p00 PARTITION OF t1 FOR VALUES WITH (MODULUS 4,
REMAINDER 0);
CREATE TABLE t1_p01 PARTITION OF t1 FOR VALUES WITH (MODULUS 4,
REMAINDER 1);
CREATE TABLE t1_p02 PARTITION OF t1 FOR VALUES WITH (MODULUS 4,
REMAINDER 2);
CREATE TABLE t1_p03 PARTITION OF t1 FOR VALUES WITH (MODULUS 4,
REMAINDER 3);
CREATE INDEX idx_t1_b_a_p00 ON t1_p00 USING btree (b, a);
CREATE INDEX idx_t1_b_a_p01 ON t1_p01 USING btree (b, a);
CREATE INDEX idx_t1_b_a_p02 ON t1_p02 USING btree (b, a);
CREATE INDEX idx_t1_b_a_p03 ON t1_p03 USING btree (b, a);
INSERT INTO t1 (SELECT i, i FROM generate_series(1, 1000000) AS i);
ANALYZE;
-- ddl.sql --
w/
-- select.sql --
\set b random(1, 1000000)
BEGIN;
SELECT t1.a, t1.b FROM t1 WHERE t1.b = :b;
COMMIT;
-- select.sql --
using pgbench -c X -j X -M prepared -T X -f select.sql part-hash
On master we have generic_cost planning cost of 33.75, and an
avg_custom_cost of 51.25 resulting in use of the generic plan and a TPS
of 8893.
Using v17 we have generic_cost planning cost of 33.75, and an
avg_custom_cost of 25.9375 resulting in use of the custom plan and a TPS
of 7129 - of course due to the generation of a custom plan for each
invocation.
Comparing master with an non-partitioned scenario; we have a TPS of
12968, since there is no overhead of ExecInitAppend (PortalStart) and
ExecAppend (PortalRun).
Could you share your thoughts on
1) if the generic plan mechanics should know about the pruning and hence
give a lower planner cost
1) if the patch should be more aggressive in removing planning nodes
that aren't necessary, e.g. going from Append -> IndexOnly to just
IndexOnly.
I have tested with both [1]https://commitfest.postgresql.org/16/1330/ and [2]https://commitfest.postgresql.org/16/1353/, but would like to know about your
thoughts on the above first.
Thanks in advance !
[1]: https://commitfest.postgresql.org/16/1330/
[2]: https://commitfest.postgresql.org/16/1353/
Best regards,
Jesper
On 5 January 2018 at 07:16, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
1) if the patch should be more aggressive in removing planning nodes that
aren't necessary, e.g. going from Append -> IndexOnly to just IndexOnly.
That's not for this patch. There's another patch [1]https://commitfest.postgresql.org/16/1353/ to do that already.
[1]: https://commitfest.postgresql.org/16/1353/
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 5 January 2018 at 07:16, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
\set b random(1, 1000000)
BEGIN;
SELECT t1.a, t1.b FROM t1 WHERE t1.b = :b;
COMMIT;
-- select.sql --using pgbench -c X -j X -M prepared -T X -f select.sql part-hash
On master we have generic_cost planning cost of 33.75, and an
avg_custom_cost of 51.25 resulting in use of the generic plan and a TPS of
8893.Using v17 we have generic_cost planning cost of 33.75, and an
avg_custom_cost of 25.9375 resulting in use of the custom plan and a TPS of
7129 - of course due to the generation of a custom plan for each invocation.Comparing master with an non-partitioned scenario; we have a TPS of 12968,
since there is no overhead of ExecInitAppend (PortalStart) and ExecAppend
(PortalRun).Could you share your thoughts on
1) if the generic plan mechanics should know about the pruning and hence
give a lower planner cost
I think the problem here is that cached_plan_cost() is costing the
planning cost of the query too low. If this was costed higher then its
more likely the generic plan would have been chosen, instead of
generating a custom plan each time.
How well does it perform if you change cpu_operator_cost = 0.01?
I think cached_plan_cost() does need an overhaul, but I think it's not
anything that should be done as part of this patch. You've picked HASH
partitioning here just because the current master does not perform any
partition pruning for that partitioning strategy.
There also might be a tiny argument here to have some method of
disabling the planner's partition pruning as we could before with SET
constraint_exclusion = 'off', but I think that's about the limit of
the interest this patch should have in that problem.
(The problem gets more complex again when doing run-time pruning, but
that's not a topic for this thread)
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David,
On 01/04/2018 09:21 PM, David Rowley wrote:
On 5 January 2018 at 07:16, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
Could you share your thoughts on
1) if the generic plan mechanics should know about the pruning and hence
give a lower planner costI think the problem here is that cached_plan_cost() is costing the
planning cost of the query too low. If this was costed higher then its
more likely the generic plan would have been chosen, instead of
generating a custom plan each time.How well does it perform if you change cpu_operator_cost = 0.01?
It gives 38.82 for generic_cost, and 108.82 for avg_custom_cost on
master (8249 TPS). And, 38.82 for generic_cost, and 79.705 for
avg_custom_cost with v17 (7891 TPS). Non-partitioned is 11722 TPS.
I think cached_plan_cost() does need an overhaul, but I think it's not
anything that should be done as part of this patch. You've picked HASH
partitioning here just because the current master does not perform any
partition pruning for that partitioning strategy.
Well, I mainly picked HASH because that is my use-case :)
For a range based setup it gives 39.84 for generic_cost, and 89.705 for
avg_custom_cost (7862 TPS).
Best regards,
Jesper
On 2018/01/05 1:28, Alvaro Herrera wrote:
From 8d627b910278203151853d324c3319c265cd36c0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH 2/5] Introduce a get_partitions_from_clauses()This one fails to apply. Please rebase.
Sorry about the absence in the last few days. I will post a new version
addressing various review comments by the end of this week.
Did you know you can use "git format-patch -v6" to generate
appropriately named patch files without having to rename them yourself?
Oh, didn't know that trick. Will try, thanks.
Regards,
Amit
On 9 January 2018 at 21:40, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Sorry about the absence in the last few days. I will post a new version
addressing various review comments by the end of this week.
Good to have you back.
There's a small problem with get_partitions_for_keys_list(), the
following case hits the Assert(false) at the bottom of that function.
create table ab_c (a int not null, b char) partition by list(a);
create table abc_a2 (b char, a int not null) partition by list(b);
create table abc_a2_b3 partition of abc_a2 for values in ('3');
alter table ab_c attach partition abc_a2 for values in (2);
select * from ab_c where a between 1 and 2 and b <= '2';
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 9 January 2018 at 21:40, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Sorry about the absence in the last few days. I will post a new version
addressing various review comments by the end of this week.
One more thing I discovered while troubleshooting a bug Beena reported
in the run-time partition pruning patch is that
classify_partition_bounding_keys properly does;
if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
constexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
constexpr = leftop;
else
/* Clause not meant for this column. */
continue;
for OpExpr clauses, but does not do the same for leftop for the
ScalarArrayOpExpr test.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 9 January 2018 at 21:40, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Sorry about the absence in the last few days. I will post a new version
addressing various review comments by the end of this week.
I'm sorry for the flood of emails today.
Beena's tests on the run-time partition pruning patch also indirectly
exposed a problem with this patch.
Basically, the changes to add_paths_to_append_rel() are causing
duplication in partition_rels.
A test case is:
create table part (a int, b int) partition by list(a);
create table part1 partition of part for values in(1) partition by list (b);
create table part2 partition of part1 for values in(1);
select * from part;
partition_rels ends up with 3 items in the list, but there's only 2
partitions here. The reason for this is that, since planning here is
recursively calling add_paths_to_append_rel, the list for part ends up
with itself and part1 in it, then since part1's list already contains
itself, per set_append_rel_size's "rel->live_partitioned_rels =
list_make1_int(rti);", then part1 ends up in the list twice.
It would be nicer if you could use a RelIds for this, but you'd also
need some way to store the target partition relation since
nodeModifyTable.c does:
/* The root table RT index is at the head of the partitioned_rels list */
if (node->partitioned_rels)
{
Index root_rti;
Oid root_oid;
root_rti = linitial_int(node->partitioned_rels);
root_oid = getrelid(root_rti, estate->es_range_table);
rel = heap_open(root_oid, NoLock); /* locked by InitPlan */
}
You could also fix it by instead of doing:
/*
* Accumulate the live partitioned children of this child, if it's
* itself partitioned rel.
*/
if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
childrel->live_partitioned_rels);
do something along the lines of:
if (childrel->part_scheme)
{
ListCell *lc;
ListCell *start = lnext(list_head(childrel->live_partitioned_rels));
for_each_cell(lc, start)
partitioned_rels = lappend_int(partitioned_rels,
lfirst_int(lc));
}
Although it seems pretty fragile. It would probably be better to find
a nicer way of handling all this.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
Thanks a lot of the review. I'm replying to your multiple emails here.
On 2017/12/28 11:07, David Rowley wrote:
I've just completed a pass over the v17 patch set. I've found a number
of things that need to be addressed. Some might seem a bit nit-picky,
sorry about that. However, many of the others genuinely need to be
addressed.1. The following code calls RelationGetPartitionQual, but the output
of that is only needed when partition_bound_has_default(boundinfo) is
true.Can you change this to only get the RelationGetPartitionQual when it's required?
OK, done that way.
2. The header comment in match_clauses_to_partkey() does not give any
warning that 'clauses' is modified within the function.The function should create a copy of the clauses before modifying
them. This will save you having to do any list_copy calls when you're
calling the function.
OK, added a list_copy on clauses at the beginning of the function.
The header comment is also not very clear about what the return value
of the function is.
Fixed the comment to describe return values.
3. "method" I think should be "strategy". We've pretty much
standardised on that term everywhere else, so let's keep to standard./*
* Does the query specify a key to be null or not null? Partitioning
* handles null partition keys specially depending on the partitioning
* method in use, we store this information.
*/
Fixed.
4. "relation" should be in single quotes, since you're talking about
the parameter named "relation". Likewise with "partclauses", otherwise
it just seems like bad English.* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses
Fixed.
5. partdesc's assignment can be delayed until it's needed. This will
save generating it when constfalse == truestatic Bitmapset *
get_partitions_from_clauses_recurse(Relation relation, int rt_index,
List *clauses)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
Moved the whole line to a tiny block where partdesc is used.
6. In the following comment, I'd have expected almost identical code,
just with some other List, but the code probably differs a bit too
much to use "Ditto"./*
* Ditto, but this time or_clauses.
*/
Hmm, OK. I tried rewriting the comment a bit.
7. Comment claims we use "set union", but we're really just collecting
the members from the other set in:/*
* Partition sets obtained from mutually-disjunctive clauses are
* combined using set union.
*/
result = bms_add_members(result, arg_partset);
Guess the comment doesn't really add much in this case, so just deleted it.
8. These arrays could just be initialized up to partkey->partnatts.
I'd imagine most of the time this will be just 1, so would save
needlessly setting the 31 other elements, although, perhaps it's a bad
idea to optimize this.memset(keyclauses_all, 0, sizeof(keyclauses_all));
/* false means we don't know if a given key is null */
memset(keyisnull, false, sizeof(keyisnull));
/* false means we don't know if a given key is not null */
memset(keyisnotnull, false, sizeof(keyisnull));The last two of these could just be Bitmapsets and you'd not need any
memset at all. PARTITION_MAX_KEYS just so happens to be the same as
BITS_PER_BITMAPWORD in a standard build, so you'd always be able to
mark everything with a single bitmap word. This would help a bit in
the various places that you're counting the true elements, for
example, the following code:for (i = 0; i < partkey->partnatts; i++)
{
if (!keys->keyisnotnull[i])
{
include_def = true;
break;
}
}could become:
include_def = !bms_is_empty(keys->keyisnotnull);
if you converted all these to Bitmapsets.
I liked the idea of using Bitmapset for keyisnull and keyisnotnull, so
implemented it.
9. The following comment would be better read as: /* clause does not
match this partition key *//* Clause not meant for this column. */
10. The following comment talks about handling less than operators for
hash opfamilies, but the code only handles <> for btree and list
partitioning.* Handle some cases wherein the clause's operator may not
* belong to the partitioning operator family. For example,
* operators named '<>' are not listed in any operator
* family whatsoever. Also, ordering opertors like '<' are
* not listed in the hash operator family."opertors" should be spelled "operators"
Rewrote the comments here a bit.
11. In the following comment "operator" should be "operators":
* are constrained by clauses containing equality operator, unless hash
Likewise for:
* IS NULL clauses while remaining have clauses with equality operator.
Fixed.
12. The following code in classify_partition_bounding_keys probably
should use EXPR_MATCHES_PARTKEY./* Does leftop match with this partition key column? */
if ((IsA(arg, Var) &&
((Var *) arg)->varattno == partattno) ||
equal(arg, partexpr))
Done.
13. The comment for classify_partition_bounding_keys does not seem to
define well what the meaning of the return value is:* classify_partition_bounding_keys
* Classify partition clauses into equal, min, and max keys, along with
* any Nullness constraints and return that information in the output
* argument keys (number of keys is the return value)I had thought all along that this must mean the number of distinct
partition keys that we've found useful clauses for, but it's possible
to fool this with duplicate IS NULL checks.create table hp (a int, b text) partition by hash (a, b);
create table hp0 partition of hp for values with (modulus 4, remainder 0);
create table hp3 partition of hp for values with (modulus 4, remainder 3);
create table hp1 partition of hp for values with (modulus 4, remainder 1);
create table hp2 partition of hp for values with (modulus 4, remainder 2);explain select * from hp where a is null and a is null;
This causes classify_partition_bounding_keys to return 2. I can't see
any bad side effects of this, but I think the comment needs to be
improved. Perhaps it's better just to make the function return bool,
returning true if there are any useful keys?
OK, I made classify_partition_bounding_keys() return a bool instead.
14. The switch statement in partition_op_strategy could be simplified
a bit and be written more as:case BTLessEqualStrategyNumber:
*incl = true;
/* fall through */
case BTLessStrategyNumber:
result = PART_OP_LESS;
break;
That's better, done.
15. No default case statements in partition_op_strategy() could result
in "result" not being set.
I added a default case, but it's simply an elog(ERROR, ...).
16. I'm unsure what the following comment means:
* To set eqkeys, we must have found the same for partition key columns.
The word "for" seems wrong, or I'm not sure what it wants to ensure it
finds the same for.
I rewrote this comment a bit, since like you, I too am no longer able to
make sense of it. Just wanted to say here that to set keys->eqkeys at
all, we must have found matching clauses containing equality operators for
all partition keys columns.
17. Header comment for partition_op_strategy is out-of-date.
/*
* Returns -1, 0, or 1 to signify that the partitioning clause has a </<=,
* =, and >/>= operator, respectively. Sets *incl to true if equality is
* implied.
*/It'll never return -1.
Oops, fixed.
18. The following comment has some grammar issues:
/*
* If couldn't coerce to the partition key type, that is, the type of
* datums stored in PartitionBoundInfo, no hope of using this
* expression for anything partitioning-related.
*/Would be better with:
/*
* If we couldn't coerce the partition key type, that is, the type
* of datums stored in PartitionBoundInfo, then there's no hope of
* using this expression for anything partitioning-related.
*/
OK, done.
19. In partkey_datum_from_expr() the Assert(false) at the end seems
overkill. You could just get rid of the default in the switch
statement and have it fall through to the final return. This would
save 4 lines of code.
Actually, let's just get rid of the switch. If I remove the default case
like you suggest, there are NodeTag enum values not handled warnings.
20. The word "and" I think needs removed from the following comment:
* Couldn't compare; keep hash_clause set to the previous value and
* so add this one directly to the result. Caller would
Actually, it seems better to keep the "and" and remove the "so".
Probably also needs a comment after "value"
Done.
21. The following comment seems to indicate 'cur' is an operator, but
it's a PartClause:/* The code below is for btree operators, which cur is not. */
It might be better to write
/* The code below handles Btree operators which are not relevant to a
hash-partitioned table. */
Agreed, done.
22. "Stuff" should be "The code" in:
* Stuff that follows closely mimics similar processing done by
Done.
23. In remove_redundant_clauses() and various other places, you have a
variable named partattoff. What's the meaning of this name? I'd
imagine it is short for "partition attribute offset", but it's a
"partition key index", so why not "partkeyidx"?Of course, you might claim that an array index is just an offset, but
it is a little confusing if you think of attribute offsets in a
TupleDesc.
Hmm, OK. Replaced partattoff with partkeyidx.
24. get_partitions_for_keys() you're using "return" and "break" in the
switch statement. You can remove the breaks;
Oops, fixed.
25. The Assert(false) in get_partitions_for_keys seems overkill. I
think it's fine to just do:return NULL; /* keep compiler quiet */
Done, too.
26. The following code comment in get_partitions_for_keys_hash() does
not seem very well written:* Hash partitioning handles puts nulls into a normal partition and
* doesn't require to define a special null-accpting partition.
* Caller didn't count nulls as a valid key; do so ourselves.Maybe "puts" should be "storing"?
Also, does hash partitioning actually support a NULL partition? This
seems to say it doesn't require, but as far as I can see it does not
*support* a NULL partition. The comment is a bit misleading.
"accpting" should be "accepting".
I rewrote that comment. I wanted to say that, unlike range and list
partitioning which have special handling for null values (there is only
one list/range partition at any given time that could contain nulls in the
partition key), hash partitioning does not. All hash partitions could
contain nulls in one or more of the partition keys and hence we must
consider nulls as regular equality keys.
27. In get_partitions_for_keys_hash() why is it possible to get a
result_index below 0? In that case, the code will end up triggering
the Assert(false), but if you want to Assert something here then maybe
it should be Assert(result_index >= 0)?if (result_index >= 0)
return bms_make_singleton(result_index);
I too used to think that it's impossible to get result_index < 0, but it's
actually possible, because not all required hash partitions may have been
defined yet:
create table hashp (a int) partition by hash (a);
create table hashp0 partition of hashp for values with (modulus 4,
remainder 0);
If user only defines one partition like shown above, then there are no
partitions for when the remainders for a given partition key turns out to
be 1, 2, or 3.
create table hashp (a int) partition by hash (a);
create table hashp0 partition of hashp for values with (modulus 4,
remainder 0);
insert into hashp values (1);
INSERT 0 1
explain select * from hashp where a = 1;
QUERY PLAN
--------------------------------------------------------------
Append (cost=0.00..41.88 rows=13 width=4)
-> Seq Scan on hashp0 (cost=0.00..41.88 rows=13 width=4)
Filter: (a = 1)
(3 rows)
insert into hashp values (2);
ERROR: no partition of relation "hashp" found for row
DETAIL: Partition key of the failing row contains (a) = (2).
explain select * from hashp where a = 2;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
I guess I forgot to remove that Assert(false) when fixing the code after
realizing that it's indeed possible to not get a hash partition for some keys.
Removed the Assert.
28. Is there any point in the following loop in get_partitions_for_keys_list()?
/*
* We might be able to get the answer sooner based on the nullness of
* keys, so get that out of the way.
*/
for (i = 0; i < partkey->partnatts; i++)
{Won't partkey->partnatts always be 1? Maybe you can Assert that
instead, just in case someone forgets to change the code if LIST is to
support multiple partition keys.
I guess that's a leftover from when all partition strategies were handled
in one function. Got rid of the loop and added the Assert.
29. In get_partitions_for_keys_list and get_partitions_for_keys_range
partdesc is only used for an Assert. Maybe you can just:Assert(RelationGetPartitionDesc(rel)->nparts > 0);
OK, done.
30. The Assert(false) in get_partitions_for_keys_list() looks
suspiciously easy to hit...create table lp (a int not null) partition by list(a);
create table lp1 partition of lp for values in(1);
explain select * from lp where a > 11; -- Assert fail!31. ranget?
* get_partitions_for_keys_range
* Return partitions of a ranget partitioned table for requested keys
The Assert indeed was easy to hit. Removed.
32. I'm not quite sure what this comment is saying:
/*
* eqoff is gives us the bound that is known to be <= eqkeys,
* given how partition_bound_bsearch works. The bound at eqoff+1,
* then, would be the upper bound of the only partition that needs
* to be scanned.
*/
Rewrote the comment as:
/*
* The bound at eqoff is known to be <= eqkeys, given the way
* partition_bound_bsearch works. Considering the same as the lower
* bound of the partition that eqkeys falls into, the bound at
* eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
* desired partition's index.
*/
Any better?
33. "one" -> "the one"
* Find the leftmost bound that satisfies the query, i.e., one that
* satisfies minkeys.
Fixed.
34. Why "will" there be multiple partitions sharing the same prefix?
* If only a prefix of the whole partition key is provided, there will
* be multiple partitions whose bound share the same prefix. If minkeyPerhaps "will" should be "may"?
I guess you're right.
35. I think "would've" in the following comment is not correct. This
seems to indicate that this has already happened, which it has not.
"will have to be" might be more correct?* satisfy the query, but don't have a valid partition assigned. The
* default partition would've been included to cover those values.
"will have to be included" sounds correct to me too.
36. "will" -> "with"?
* Since partition keys will nulls are mapped to default range
Fixed.
37. "If no" -> "If there's no"
* If no tuple datum to compare with the bound, consider
* the latter to be greater.
38. I don't see anything about setting keynullness here:
/*
* Get the clauses that match the partition key, including information
* about any nullness tests against partition keys. Set keynullness to
* a invalid value of NullTestType, which 0 is not.
*/
partclauses = match_clauses_to_partkey(root, rel,
list_copy(rel->baserestrictinfo),
&contains_const,
&constfalse);
I guess the comment is too ancient. Fixed.
39. "paritions" -> "partitions"
* Else there are no clauses that are useful to prune any paritions,
Fixed.
40. In match_clauses_to_partkey, the following code pulls the varnos
from each operand of the expression. It would be better to just pull
the side that's needed (if any) a bit later rather than always doing
both.left_relids = pull_varnos((Node *) leftop);
right_relids = pull_varnos((Node *) rightop);
Ah, done.
41. I think a list_copy is missing here:
/*
* Accumulate the live partitioned children of this child, if it's
* itself partitioned rel.
*/
if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
childrel->live_partitioned_rels);Surely if we don't perform a list_copy of
childrel->live_partitioned_rels then subsequent additions to the
resulting list will inadvertently add new items to the
childrel->live_partitioned_rels?
Yeah, I think what's in the patch now is clearly hazardous. Added a
list_copy on childrel->live_partitioned_rels.
42. Is this because of an existing bug?
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti Filter: (abs(b) = 5) -> Seq Scan on mcrparted3 Filter: (abs(b) = 5) + -> Seq Scan on mcrparted4 + Filter: (abs(b) = 5) -> Seq Scan on mcrparted5 Filter: (abs(b) = 5)
Actually, to constraint exclusion's eyes, mcrparted4's partition
constraint is refuted by abs(b) = 5.
With the new patch, we simply do not perform any partition pruning for
that clause, because we do not have a constraint on an earlier partition
key column.
43. In partition_prune.sql you have a mix of /* */ and -- comments.
Please just use --
Oops, fixed.
On 2017/12/29 15:47, David Rowley wrote:
Just a few extras that I found:
44. In match_clauses_to_partkey you're making use of
estimate_expression_value(), I don't think this is safe.if (IsA(estimate_expression_value(root, rightop), Const))
*contains_const = true;The only other places I see using this in the planner are for costing
purposes. Also, the header comment for that function says it's not
safe. Particularly "This effectively means that we plan using the
first supplied value of the Param.". If that's the case, then if we're
planning a generic plan, then wouldn't it be possible that the planner
chooses the current supplied parameter value and prune away partitions
based on that value. That would make the plan invalid for any other
parameter, but it's meant to be a generic plan, so we can't do that.
You might be right. Perhaps, I was thinking of eval_const_expressions()
there, which I guess should be enough for this purpose.
45. Why use a list_copy() here?
/*
* For a nested ArrayExpr, we don't know how to get the
* actual scalar values out into a flat list, so we give
* up doing anything with this ScalarArrayOpExpr.
*/
if (arrexpr->multidims)
continue;elem_exprs = list_copy(arrexpr->elements);
I guess I was just being paranoid there. No need, so removed.
On 2018/01/10 7:35, David Rowley wrote:
There's a small problem with get_partitions_for_keys_list(), the
following case hits the Assert(false) at the bottom of that function.create table ab_c (a int not null, b char) partition by list(a);
create table abc_a2 (b char, a int not null) partition by list(b);
create table abc_a2_b3 partition of abc_a2 for values in ('3');
alter table ab_c attach partition abc_a2 for values in (2);select * from ab_c where a between 1 and 2 and b <= '2';
I removed that Assert per your comment above (in an earlier email of yours).
On 2018/01/10 10:55, David Rowley wrote:
One more thing I discovered while troubleshooting a bug Beena reported
in the run-time partition pruning patch is that
classify_partition_bounding_keys properly does;if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
constexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
constexpr = leftop;
else
/* Clause not meant for this column. */
continue;for OpExpr clauses, but does not do the same for leftop for the
ScalarArrayOpExpr test.
I'm not sure why we'd need to do that? Does the syntax of clauses that
use a ScalarArrayOpExpr() allow them to have the partition key on RHS?
Can you point me to the email where Beena reported the problem in question?
On 2018/01/10 13:18, David Rowley wrote:
Beena's tests on the run-time partition pruning patch also indirectly
exposed a problem with this patch.Basically, the changes to add_paths_to_append_rel() are causing
duplication in partition_rels.A test case is:
create table part (a int, b int) partition by list(a);
create table part1 partition of part for values in(1) partition by list (b);
create table part2 partition of part1 for values in(1);select * from part;
partition_rels ends up with 3 items in the list, but there's only 2
partitions here. The reason for this is that, since planning here is
recursively calling add_paths_to_append_rel, the list for part ends up
with itself and part1 in it, then since part1's list already contains
itself, per set_append_rel_size's "rel->live_partitioned_rels =
list_make1_int(rti);", then part1 ends up in the list twice.
It seems that I found the problem. Currently, the set_append_rel_size()
step already accumulates the full list in the root partitioned table's
rel->live_partitioned_rels, that is, the list of RT indexes of *all*
partitioned relations in the tree. Then, when add_paths_to_append_rel()
tries to accumulate child rel's live_partitioned_rels into the parent's,
duplication occurs, because the latter already contains all the entries as
compiled by the earlier step. I think having only the latter do the
accumulation is better, because even partition-wise join code needs this
facility and it only ever calls add_paths_to_append_rel().
Attached updated version of the patch set containing fixes for almost all
the things mentioned above.
Thanks again for your thoughtful review comments and sorry that I couldn't
reply sooner.
Thanks,
Amit
Attachments:
v18-0001-Some-interface-changes-for-partition_bound_-cmp-.patchtext/plain; charset=UTF-8; name=v18-0001-Some-interface-changes-for-partition_bound_-cmp-.patchDownload
From 505323bc4f745bd4525e46b0d511c44beb819f8c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH v18 1/5] Some interface changes for
partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 164 ++++++++++++++++++++++++++++++----------
1 file changed, 122 insertions(+), 42 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8adc4ee977..d937edcd83 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -985,6 +1011,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -999,8 +1027,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1073,10 +1107,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1127,6 +1167,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1148,8 +1189,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1163,9 +1207,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2537,12 +2581,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2569,11 +2616,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2845,12 +2896,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2872,11 +2923,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2885,25 +2936,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If there's no tuple datum to compare with the bound,
+ * consider the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2914,12 +2995,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2933,20 +3015,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2959,8 +3040,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
v18-0002-Introduce-a-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v18-0002-Introduce-a-get_partitions_from_clauses.patchDownload
From 6140f2378a9d253e8cafcc10c0b41468246d6f6e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v18 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of query clauses and
return a set of indexes of the partitions that satisfy all of those
clauses.
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Callers must have checked that each of the clauses matches one of the
partition keys.
---
src/backend/catalog/partition.c | 1983 ++++++++++++++++++++++++++++++++++
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/optimizer/clauses.h | 2 +
5 files changed, 1992 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d937edcd83..0d9c774005 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -163,6 +167,80 @@ typedef struct PartitionBoundCmpArg
int ndatums;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioing operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ *
+ * Equal keys are not required to be in any particular order, unlike the
+ * keys below which must appear in the same order as partition keys.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Does the query specify a key to be null or not null? Partitioning
+ * handles null partition keys specially depending on the partitioning
+ * strategy in use, we store this information.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +289,35 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static Bitmapset *get_partitions_from_ne_clauses(Relation relation,
+ List *ne_clauses);
+static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
+ int rt_index, List *or_clause_args);
+static bool classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses, List **ne_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partkeyidx, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1581,9 +1688,1885 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_from_clauses
+ * Determine the set of partitions of 'relation' that will satisfy all
+ * the clauses contained in 'partclauses'
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ Bitmapset *result;
+ List *partconstr;
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+
+ Assert(partclauses != NIL);
+
+ /*
+ * If relation is a partition itself, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partition_bound_has_default(boundinfo) &&
+ (partconstr = RelationGetPartitionQual(relation)) != NIL)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ }
+
+ result = get_partitions_from_clauses_recurse(relation, rt_index,
+ partclauses);
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_from_clauses_recurse
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ Bitmapset *result = NULL;
+ PartScanKeyInfo keys;
+ bool constfalse;
+ List *or_clauses,
+ *ne_clauses;
+ ListCell *lc;
+
+ /*
+ * Try to reduce the set of clauses into a form that
+ * get_partitions_for_keys() can work with.
+ */
+ if (classify_partition_bounding_keys(relation, clauses, rt_index,
+ &keys, &constfalse,
+ &or_clauses, &ne_clauses))
+ {
+ /*
+ * classify_partition_bounding_keys() may have found clauses marked
+ * pseudo-constant that are false that the planner didn't or it may
+ * have itself found contradictions among clauses.
+ */
+ if (constfalse)
+ return NULL;
+
+ result = get_partitions_for_keys(relation, &keys);
+ }
+ else
+ {
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ }
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we got
+ * an empty set in the first place.
+ */
+ if (constfalse || bms_is_empty(result))
+ return NULL;
+
+ /* Select partitions by applying the clauses containing <> operators. */
+ if (ne_clauses)
+ {
+ Bitmapset *ne_clause_parts;
+
+ ne_clause_parts = get_partitions_from_ne_clauses(relation, ne_clauses);
+
+ /*
+ * Clauses in ne_clauses are in conjunction with the clauses that
+ * selected the partitions contained in result, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, ne_clause_parts);
+ bms_free(ne_clause_parts);
+ }
+
+ /* Select partitions by applying OR clauses. */
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_clause_args(relation, rt_index,
+ or->args);
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Assumes partkey exists in the scope and is of a list partitioned table. */
+#define partkey_datums_equal(d1, d2)\
+ (0 == DatumGetInt32(FunctionCall2Coll(&partkey->partsupfunc[0],\
+ partkey->partcollation[0],\
+ (d1), (d2))))
+/*
+ * Check if d is equal to some member of darray where equality is determined
+ * by the partitioning comparison function.
+ */
+static bool
+datum_in_array(PartitionKey partkey, Datum d, Datum *darray, int n)
+{
+ int i;
+
+ if (darray == NULL || n == 0)
+ return false;
+
+ for (i = 0; i < n; i++)
+ if (partkey_datums_equal(d, darray[i]))
+ return true;
+
+ return false;
+}
+
+/*
+ * count_partition_datums
+ *
+ * Returns the number of non-null datums allowed by a non-default list
+ * partition with given index.
+ */
+static int
+count_partition_datums(Relation rel, int index)
+{
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ int i,
+ result = 0;
+
+ Assert(index != boundinfo->default_index);
+
+ /*
+ * The answer is as many as the count of occurrence of the value index
+ * in boundinfo->indexes[].
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ if (index == boundinfo->indexes[i])
+ result += 1;
+
+ return result;
+}
+
+/*
+ * get_partitions_from_ne_clauses
+ *
+ * Return partitions of relation that satisfy all <> operator clauses in
+ * ne_clauses. Only ever called if relation is a list partitioned table.
+ */
+static Bitmapset *
+get_partitions_from_ne_clauses(Relation relation, List *ne_clauses)
+{
+ ListCell *lc;
+ Bitmapset *result,
+ *excluded_parts;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ Datum *exclude_datums;
+ int *count_excluded,
+ n_exclude_datums,
+ i;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+
+ /*
+ * How this works:
+ *
+ * For each constant expression, we look up the partition that would
+ * contain its value and mark the same as excluded partition. After
+ * doing the same for all clauses we'll have set of partitions that
+ * are excluded. For each excluded partition, check if there exist
+ * values that it allows but are not specified in the clauses, if so
+ * the partition won't actually be excluded.
+ */
+
+ /* De-duplicate constant values. */
+ exclude_datums = (Datum *) palloc0(list_length(ne_clauses) *
+ sizeof(Datum));
+ n_exclude_datums = 0;
+ foreach(lc, ne_clauses)
+ {
+ PartClause *pc = lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(partkey, 0, pc->constarg, &datum) &&
+ !datum_in_array(partkey, datum, exclude_datums, n_exclude_datums))
+ exclude_datums[n_exclude_datums++] = datum;
+ }
+
+ /*
+ * For each value, if it's found in boundinfo, increment the count of its
+ * partition as excluded due to that value.
+ */
+ count_excluded = (int *) palloc0(partdesc->nparts * sizeof(int));
+ for (i = 0; i < n_exclude_datums; i++)
+ {
+ int offset,
+ excluded_part;
+ bool is_equal;
+ PartitionBoundCmpArg arg;
+ Datum argdatums[] = {exclude_datums[i]};
+
+ memset(&arg, 0, sizeof(arg));
+ arg.datums = argdatums;
+ arg.ndatums = 1;
+ offset = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+ if (offset >= 0 && is_equal && boundinfo->indexes[offset] >= 0)
+ {
+ excluded_part = boundinfo->indexes[offset];
+ count_excluded[excluded_part]++;
+ }
+ }
+
+ excluded_parts = NULL;
+ for (i = 0; i < partdesc->nparts; i++)
+ {
+ /*
+ * If all datums of this partition appeared in ne_clauses, exclude
+ * this partition.
+ */
+ if (count_excluded[i] > 0 &&
+ count_excluded[i] == count_partition_datums(relation, i))
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Also, exclude the "null-only" partition, because strict clauses in
+ * ne_clauses will not select any rows from it.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ count_partition_datums(relation, boundinfo->null_index) == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(count_excluded);
+ pfree(exclude_datums);
+
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ result = bms_del_members(result, excluded_parts);
+ bms_free(excluded_parts);
+
+ return result;
+}
+
+/*
+ * get_partitions_from_or_clause_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_clause_args.
+ */
+static Bitmapset *
+get_partitions_from_or_clause_args(Relation relation, int rt_index,
+ List *or_clause_args)
+{
+ ListCell *lc;
+ Bitmapset *result = NULL;
+
+ foreach(lc, or_clause_args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation
+ * due to the latter's partition constraint, which means we must
+ * not add its partitions to or_partset. But the clause may not
+ * contain this relation's partition key expressions (instead the
+ * parent's), so we could not depend on just calling
+ * get_partitions_from_clauses_recurse(relation, ...) to determine
+ * that the clause indeed prunes all of the relation's partition.
+ *
+ * Use predicate refutation proof instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation, rt_index,
+ arg_clauses);
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ?\
+ (IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) :\
+ equal((expr), (partexpr)))
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, and max keys, along with
+ * any Nullness constraints and return that information in the output
+ * argument 'keys' (Returns true if 'keys' contains valid information
+ * upon return, otherwise false.)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max
+ * bounds. For example, of a > 1, a > 2, and a >= 5, "5" is the best min
+ * bound for the column a, which also happens to be an inclusive bound.
+ * When analyzing multiple clauses referencing the same key, it is checked
+ * if there are mutually contradictory clauses and if so, we set *constfalse
+ * to true to indicate to the caller that the set of clauses cannot be true
+ * for any partition. It is also set if the list already contains a
+ * pseudo-constant clause.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by clauses containing equality operators, unless hash
+ * partitioning is in use, in which case, it's possible that some keys have
+ * IS NULL clauses while remaining have clauses with equality operators.
+ * Min and max bounds could contain bound values for only a prefix of keys.
+ *
+ * All the OR clauses encountered in the list and those generated from certain
+ * ScalarArrayOpExprs are added to *or_clauses. It's the responsibility of the
+ * caller to process the argument clauses of each of the OR clauses, which
+ * would involve recursively calling this function.
+ *
+ * Clauses containing a <> operator are added to *ne_clauses, provided its
+ * negator is a valid partitioning equality operator and that too only if
+ * list partitioning is in use.
+ */
+static bool
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses,
+ List **ne_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool will_compute_keys = false;
+ Bitmapset *keyisnull = NULL,
+ *keyisnotnull = NULL;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *ne_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, sizeof(keyclauses_all));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ /*
+ * A non-zero partattno refers to a simple column reference that
+ * will be matched against varattno of a Var appearing the clause.
+ * partattno == 0 refers to arbitrary expressions, which get the
+ * current one from PartitionKey.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ /* Copy to avoid overwriting the relcache's content. */
+ partexpr = copyObject(lfirst(partexprs_item));
+
+ /*
+ * Expressions stored in PartitionKey in the relcache all
+ * contain a dummy varno (that is, 1), but we must switch to
+ * the RT index of the table in this query so that it can be
+ * correctly matched to the expressions coming from the query.
+ */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ constexpr = leftop;
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering opertors like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ int strategy;
+ Oid negator,
+ lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber &&
+ partkey->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * Flip the left and right args if we have to, because the
+ * code which extract the constant value to use for
+ * partition-pruning expects to find it as the rightop of the
+ * clause. (See below in this function.)
+ */
+ if (constexpr == rightop)
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+ Oid commutator = get_commutator(opclause->opno);
+
+ /*
+ * Caller must have made sure to check that the commutator
+ * indeed exists.
+ */
+ Assert(OidIsValid(commutator));
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = commutator;
+ commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_from_ne_clauses().
+ */
+ if (is_ne_listp)
+ *ne_clauses = lappend(*ne_clauses, pc);
+ else
+ {
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+
+ /*
+ * Since we only allow strict operators, require keys to
+ * be not null.
+ */
+ keyisnotnull = bms_add_member(keyisnotnull, i);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (EXPR_MATCHES_PARTKEY(arg, partattno, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ keyisnull = bms_add_member(keyisnull, i);
+ else
+ keyisnotnull = bms_add_member(keyisnotnull, i);
+ n_keynullness++;
+ will_compute_keys = true;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (!will_compute_keys || *constfalse)
+ return 0;
+
+ /*
+ * Try to eliminate redundant keys. In the process, we might find out
+ * that clauses are mutually contradictory and hence can never be true
+ * for any rows.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i], &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+
+ /*
+ * Generate bounding tuple(s).
+ *
+ * We look up partitions in the partition bound descriptor using, say,
+ * partition_bound_bsearch(), which expects a Datum (or Datums if multi-
+ * column key). So, extract the same out of the constant argument of
+ * each clause.
+ *
+ * Further, based on the strategies of clause operators (=, </<=, >/>=),
+ * try to construct tuples out of those datums that serve as the exact
+ * look-up tuple or minimum/maximum bounding tuple(s). If we find datums
+ * for all partition key columns that appear in = operator clauses, then
+ * we have the look-up tuple to be exactly matched, which will return just
+ * one partition if one exists. If the last value of the tuple comes from
+ * a </<= or >/>= operator, then that constitutes the minimum and maximum
+ * bounding tuple, respectively. There is one exception -- if the tuple
+ * constitutes a proper prefix of partition key columns, with none of its
+ * values coming from a </<= or >/>= operator, we consider such tuple both
+ * the minimum and maximum bounding tuple. For a multi-column range
+ * partitioned table, there usually exists a sequence of consecutive
+ * partitions that share a prefix of partition bound, which are all
+ * matched by a bounding tuple of the aforementioned shape.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing equality
+ * operators for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = keyisnull;
+ keys->keyisnotnull = keyisnotnull;
+
+ return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'op' contains an =, </<=, or >/>=
+ * operator and sets *incl if equality is implied
+ */
+static PartOpStrategy
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ PartOpStrategy result;
+
+ *incl = false; /* overwritten as appropriate below */
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = PART_OP_EQUAL;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ result = PART_OP_LESS;
+ break;
+ case BTEqualStrategyNumber:
+ *incl = true;
+ result = PART_OP_EQUAL;
+ break;
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ result = PART_OP_GREATER;
+ break;
+ }
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partkeyidx])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partkeyidx], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * For a given partition key column, find the most restrictive of the clauses
+ * contained in all_clauses that are known to match the column. If in the
+ * process, it is found that two clauses are mutually contradictory, we simply
+ * stop, set *constfalse to true, and return.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey, int partkeyidx,
+ List *all_clauses, List **result,
+ bool *constfalse)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ hash_clause = NULL;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[partkeyidx],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've matched
+ * a clause and found another whose constant operand doesn't match
+ * the constant operand of the former, we have a case of mutually
+ * contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, partkeyidx,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with the same. It's possible that mutual
+ * contradiction is proved at some higher level, but it's just
+ * that we couldn't do so here.
+ */
+ else
+ *result = lappend(*result, cur);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points to the currently best scan key of strategy
+ * type s+1; it is NULL if we haven't yet found such a key for this
+ * attr.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, partkeyidx,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, replace old key. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+
+ /* The old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ *result = lappend(*result, hash_clause);
+ return;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equal key with keys of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq key is
+ * a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq key is a = 3, then because 3 < 5, we no longer need a < 5,
+ * because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, partkeyidx,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partkeyidx,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partkeyidx,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the result.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ if (btree_clauses[s])
+ *result = lappend(*result, btree_clauses[s]);
+}
+
+/*
+ * Evaluate 'leftarg op rightarg' and set *result to its value.
+ *
+ * leftarg and rightarg referred to above actually refer to the constant
+ * operand (Datum) of the clause contained in the parameters leftarg and
+ * rightarg below, respectively. And op refers to the operator of the
+ * clause contained in the parameter op below.
+ *
+ * Returns true if we could actually perform the evaluation. False is
+ * returned otherwise, that is, in cases where we couldn't perform the
+ * evaluation for reasons such as operands values being unavailable or
+ * types of operands being incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partkeyidx];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg_const and rightarg_const are both of the type expected
+ * by op's operator, then compare them using the latter.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ return NULL; /* keep compiler quiet */
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor
+ * using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Hash partitioning stores partition keys containing nulls in regular
+ * partitions. That is, the code that determines the hash partition for
+ * a given row admits nulls in the partition key when computing the key's
+ * hash. So, here we treat any IS NULL clauses on partition key columns as
+ * equality keys, along with any other non-null values coming from equality
+ * operator clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+ Assert(partkey->partnatts == 1);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ int other_idx = -1;
+
+ /*
+ * Only a designated partition accepts nulls, which if there
+ * exists one, return the same.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) ||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ return bms_make_singleton(other_idx);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ /* Look up using binary search if eqkeys matches any of the datums. */
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * minoff set to -1 means all datums are greater than minkeys, which
+ * means all partitions satisfy minkeys. In that case, set minoff to
+ * the index of the leftmost datum, viz. 0.
+ *
+ * If the bound at minoff doesn't exactly match minkey or if it does,
+ * but minkey isn't inclusive, move to the bound on the right.
+ */
+ if (minoff == -1 || !is_equal || !keys->min_incl)
+ minoff++;
+
+ /*
+ * boundinfo->ndatums - 1 is the last valid list partition datums
+ * index.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ minoff = -1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * Unlike minoff, we leave maxoff that is set to -1 unchanged, because
+ * it simply means none of the partitions satisfies maxkeys.
+ *
+ * If the bound at maxoff exactly matches maxkey (is_equal), but the
+ * maxkey is not inclusive, then go to the bound on left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some datums
+ * (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ Bitmapset *result = NULL;
+
+ /*
+ * All datums between those at minoff and maxoff satisfy the query
+ * keys, so add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a ranget partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ /* Look up using binary search if eqkeys matches any of the datums. */
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering the same as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * If only a prefix of the whole partition key is provided, there may
+ * be multiple partitions whose bound share the same prefix. If minkey
+ * is inclusive, we must make minoff point to the leftmost such bound,
+ * making the result contain all such partitions. If it is exclusive,
+ * we must move minoff to the right such that minoff points to the
+ * first partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in the
+ * result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, minoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * At this point, minoff gives us the leftmost bound that is known to
+ * be <= query's minkey. The bound at minoff + 1 (if there is one),
+ * then, would be the upper bound of the leftmost partition that needs
+ * to be scanned.
+ */
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ *
+ * 1 more index than range partition datums
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, maxoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * At this point, maxoff gives us the rightmost bound that is known to
+ * be <= query's maxkey. The bound at maxoff+1, then, would be the
+ * upper bound of the rightmost partition that needs to be scanned.
+ * Although, if the bound is equal to maxkeys and the latter is not
+ * inclusive, then the bound at maxoff itself is the upper bound of
+ * the rightmost partition that needs to be scanned.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool include_def = false;
+ Bitmapset *result = NULL;
+
+
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper bound of
+ * an unassigned range of values, move to the adjacent bound which must
+ * be the upper bound of the leftmost or rightmost partition,
+ * respectively, that needs to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do indeed
+ * satisfy the query, but don't have a valid partition assigned. The
+ * default partition will have to be included to cover those values.
+ * Although, if the original bound in question is an infinite value,
+ * there would not be any unassigned range to speak of, because the
+ * range is unbounded in that direction by definition, so no need to
+ * include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There might exist a range of values unassigned to any non-default
+ * range partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index cf38b4eb5e..ccfae4f31e 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..8423c6e886 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -73,4 +73,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
v18-0003-Move-some-code-of-set_append_rel_size-to-separat.patchtext/plain; charset=UTF-8; name=v18-0003-Move-some-code-of-set_append_rel_size-to-separat.patchDownload
From 2a6df63095c7851abb5dbbee7c2a006c4354b798 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH v18 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 12a6ee4a22..f5c11a17cf 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ac5a7c9553..35345ccbe9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 725694f570..9b4288ad92 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -300,5 +300,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
v18-0004-More-refactoring-around-partitioned-table-Append.patchtext/plain; charset=UTF-8; name=v18-0004-More-refactoring-around-partitioned-table-Append.patchDownload
From 8dfa6e6150f4b3719313909688cd9323329b216b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v18 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 120 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 19 ++++--
src/backend/optimizer/util/relnode.c | 10 +++
src/include/nodes/relation.h | 22 ++++++-
4 files changed, 115 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f5c11a17cf..83035a3e8d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1207,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1299,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1349,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ list_copy(childrel->live_partitioned_rels));
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7b52dadd81..b0f6051618 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6189,14 +6189,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 35345ccbe9..4b5d50eb2c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +743,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 71689b8ed6..63623f2687 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,6 +529,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +658,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
v18-0005-Teach-planner-to-use-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v18-0005-Teach-planner-to-use-get_partitions_from_clauses.patchDownload
From 07ee31fe87cc91fb1d7e4359b639333ea1f8cf59 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH v18 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using the information in its
PartitionScheme. Further, if we pass the set of matched clauses to
get_partitions_from_clauses(), we get the set of matching partitions
in (hopefully) less time than determining the same by running the
matching algorithm separately for each partition.
Authors: Amit Langote, Dilip Kumar
---
src/backend/optimizer/path/allpaths.c | 399 ++++++++++++++++++++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 41 ++-
src/include/nodes/relation.h | 7 +-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition_prune.out | 442 ++++++++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 74 ++++-
7 files changed, 917 insertions(+), 78 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 83035a3e8d..c9c998bd34 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,8 +20,10 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
@@ -136,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -847,6 +857,390 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * in rel->baserestrictinfo. An empty list is returned if no matching
+ * partitions were found.
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *partclauses;
+ bool contains_const,
+ constfalse;
+ List *result = NIL;
+ int i;
+ Relation parent;
+ PartitionDesc partdesc;
+ Bitmapset *partindexes;
+
+ /*
+ * Get the clauses that match the partition key. It's also a good idea
+ * to check if the matched clauses contain constant values that can be
+ * used for pruning and go to get_partitions_from_clauses() only if so.
+ * If rel->baserestrictinfo might contain mutually contradictory clauses,
+ * also find out about that.
+ */
+ partclauses = match_clauses_to_partkey(root, rel, rel->baserestrictinfo,
+ &contains_const, &constfalse);
+
+ /* We're done here. */
+ if (constfalse)
+ return NIL;
+
+ parent = heap_open(rte->relid, NoLock);
+ partdesc = RelationGetPartitionDesc(parent);
+
+ if (partclauses != NIL && contains_const)
+ partindexes = get_partitions_from_clauses(parent, rel->relid,
+ partclauses);
+ else
+ {
+ /*
+ * There are no clauses that are useful to prune any partitions, so
+ * scan all partitions.
+ */
+ partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * Returned list contains clauses matched to the partition key columns and
+ * *contains_const and *constfalse are set as described below.
+ *
+ * For an individual clause to match with a partition key column, the clause
+ * must be an operator clause of the form (partkey op const) or (const op
+ * partkey); the latter only if a suitable commutator exists. Furthermore,
+ * the operator must be strict and its input collation must match the partition
+ * collation. The aforementioned "const" means any expression that doesn't
+ * involve a volatile function or a Var of this relation. We allow Vars
+ * belonging to other relations (for example, if the clause is a join clause),
+ * but they are treated as parameters whose values are not known now, so cannot
+ * be used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join clauses
+ * appropriately. If a NullTest against a partition key is encountered, it's
+ * added to the result as well.
+ *
+ * *contains_const is set if at least one matched clauses contains the constant
+ * operand or is a Nullness test. *constfalse is set if the input list
+ * contains a pseudo-constant RestrictInfo with false value.
+ */
+static List *
+match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ /* Make a copy, because we may scribble on it below. */
+ clauses = list_copy(clauses);
+
+ foreach(lc, clauses)
+ {
+ Node *member = lfirst(lc);
+ Expr *clause;
+ int i;
+
+ if (IsA(member, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) member;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+ else
+ clause = (Expr *) member;
+
+ /*
+ * For a BoolExpr, we should try to match each of its args with the
+ * partition key as described below for each type.
+ */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ /*
+ * For each of OR clause's args, call this function
+ * recursively with a given arg as the only member in the
+ * input list and see if it's returned as matching the
+ * partition key. Add the OR clause to the result iff at
+ * least one of its args contain a matching clause.
+ */
+ BoolExpr *orclause = (BoolExpr *) clause;
+ ListCell *lc1;
+ bool arg_matches_key = false,
+ matched_arg_contains_const = false,
+ all_args_constfalse = true;
+
+ foreach (lc1, orclause->args)
+ {
+ Node *arg = lfirst(lc1);
+ bool contains_const1,
+ constfalse1;
+
+ if (match_clauses_to_partkey(root, rel, list_make1(arg),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ arg_matches_key = true;
+ matched_arg_contains_const = contains_const1;
+ }
+
+ /* We got at least one arg that is not constant false. */
+ if (!constfalse1)
+ all_args_constfalse = false;
+ }
+
+ if (arg_matches_key)
+ {
+ result = lappend(result, clause);
+ *contains_const = matched_arg_contains_const;
+ }
+
+ /* OR clause is "constant false" if all of its args are. */
+ *constfalse = all_args_constfalse;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Since the clause is itself implicitly ANDed with other
+ * clauses in the input list, queue the args to be processed
+ * later as if they were part of the original input list.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the clauses matches the partition key and add it to
+ * the result list if other things such as operator input
+ * collation, strictness, etc. look fine.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = pull_varnos((Node *) rightop);
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = pull_varnos((Node *) leftop);
+ expr_op = get_commutator(expr_op);
+
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning.
+ */
+ result = lappend(result, clause);
+
+ if (!*contains_const)
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* OK to add to the result. */
+ result = lappend(result, clause);
+ if (IsA(eval_const_expressions(root, rightop), Const))
+ *contains_const = true;
+ else
+ *contains_const = false;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* A Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Certain Boolean conditions have a special shape, which we
+ * accept if the partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ /*
+ * Only accept those for pruning that appear to be
+ * IS [NOT] TRUE/FALSE.
+ */
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+ Expr *arg = btest->arg;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN &&
+ equal((Node *) arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var))
+ {
+ if (equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ }
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +1282,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 1d152c514e..d9249f4c33 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1397,6 +1397,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8c60b35068..c103deb21b 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1247,22 +1246,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1920,6 +1929,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 63623f2687..855d51ea09 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..2072766efd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..83e60814f7 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1067,363 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..13b12078bf 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,76 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
On 2018/01/04 23:29, Ashutosh Bapat wrote:
On Fri, Dec 29, 2017 at 6:32 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I happened to notice that Ashutosh's patch series at
/messages/by-id/CAFjFpReJhFSoy6DqH0ipFSHd=sLNEkSzAtz4VWCaS-w2jZL=uw@mail.gmail.com
has a 0001 patch that modifies the partition_bound_cmp stuff too.
Are those conflicting?Ashutosh's commit message:
Modify bound comparision functions to accept members of PartitionKeyFunctions partition_bound_cmp(), partition_rbound_cmp() and
partition_rbound_datum_cmp() are required to merge partition bounds
from joining relations. While doing so, we do not have access to the
PartitionKey of either relations. So, modify these functions to accept
only required members of PartitionKey so that the functions can be
reused for merging bounds.Amit's:
Some interface changes for partition_bound_{cmp/bsearch}Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.I haven't looked at Amit's changes, but we need a more flexible way to
pass information required for datum comparison than using
PartitionKey, since that's not available in the optimizer and can not
be associated with join, aggregate relations. If we pass that
information through a structure, there are two ways
1. it will need to be part of PartitionScheme; I am not sure if we can
have a substructure in PartitionKey. But if we can do it that way, we
can pass that structure to the functions.
2. we will need to construct the structure filling it with comparison
information and pass it to the comparison functions. I think what we
achieve out of this isn't worth the code we will need to add.I would prefer first approach over the other.
ISTM that they're non-conflicting for the most part. My patch is about
modifying the way to bring "datums" into partition_bound_cmp(), whereas
Ashutosh's is about modifying the way we bring the partition key
information. Changes seem orthogonal to me, although, the patches
definitely won't like each other when applying to the tree.
Thanks,
Amit
Thanks for addressing that list.
Just one thing to reply on before I look at the updated version:
On 11 January 2018 at 22:52, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/01/10 10:55, David Rowley wrote:
One more thing I discovered while troubleshooting a bug Beena reported
in the run-time partition pruning patch is that
classify_partition_bounding_keys properly does;if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
constexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
constexpr = leftop;
else
/* Clause not meant for this column. */
continue;for OpExpr clauses, but does not do the same for leftop for the
ScalarArrayOpExpr test.I'm not sure why we'd need to do that? Does the syntax of clauses that
use a ScalarArrayOpExpr() allow them to have the partition key on RHS?
No, but there's no test to ensure the leftop matches the partition key.
There's just:
ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
Oid saop_op = saop->opno;
Oid saop_opfuncid = saop->opfuncid;
Oid saop_coll = saop->inputcollid;
Node *leftop = (Node *) linitial(saop->args),
*rightop = (Node *) lsecond(saop->args);
List *elem_exprs,
*elem_clauses;
ListCell *lc1;
bool negated = false;
/*
* In case of NOT IN (..), we get a '<>', which while not
* listed as part of any operator family, we are able to
* handle the same if its negator is indeed a part of the
* partitioning operator family.
*/
if (!op_in_opfamily(saop_op, partopfamily))
{
Oid negator = get_negator(saop_op);
int strategy;
Oid lefttype,
righttype;
if (!OidIsValid(negator))
continue;
get_op_opfamily_properties(negator, partopfamily, false,
&strategy,
&lefttype, &righttype);
if (strategy == BTEqualStrategyNumber)
negated = true;
}
Since there's nothing to reject the clause that does not match the
partition key, the IN's left operand might be of any random type, and
may well not be in partopfamily, so when it comes to looking up
get_op_opfamily_properties() you'll hit: elog(ERROR, "operator %u is
not a member of opfamily %u", opno, opfamily);
Still looking at the v17 patch here, but I also don't see a test to
see if the IsBooleanOpfamily(partopfamily) is checking it matches the
partition key.
Can you point me to the email where Beena reported the problem in question?
/messages/by-id/CAOG9ApERiop7P=GRkqQKa82AuBKjxN3qVixie3WK4WqQpEjS6g@mail.gmail.com
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 11 January 2018 at 22:52, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Can you point me to the email where Beena reported the problem in question?
/messages/by-id/CAOG9ApERiop7P=GRkqQKa82AuBKjxN3qVixie3WK4WqQpEjS6g@mail.gmail.com
To save you from having to look at the run-time prune patch, here's
case that break in v18.
create table xy (a int, b text) partition by range (a,b);
create table xy1 partition of xy for values from (0,'a') to (10, 'b');
select * from xy where a = 1 and b in('x','y');
ERROR: operator 531 is not a member of opfamily 1976
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/01/11 19:23, David Rowley wrote:
On 11 January 2018 at 22:52, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/01/10 10:55, David Rowley wrote:
One more thing I discovered while troubleshooting a bug Beena reported
in the run-time partition pruning patch is that
classify_partition_bounding_keys properly does;if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
constexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
constexpr = leftop;
else
/* Clause not meant for this column. */
continue;for OpExpr clauses, but does not do the same for leftop for the
ScalarArrayOpExpr test.I'm not sure why we'd need to do that? Does the syntax of clauses that
use a ScalarArrayOpExpr() allow them to have the partition key on RHS?No, but there's no test to ensure the leftop matches the partition key.
There's just:
ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
Oid saop_op = saop->opno;
Oid saop_opfuncid = saop->opfuncid;
Oid saop_coll = saop->inputcollid;
Node *leftop = (Node *) linitial(saop->args),
*rightop = (Node *) lsecond(saop->args);
List *elem_exprs,
*elem_clauses;
ListCell *lc1;
bool negated = false;/*
* In case of NOT IN (..), we get a '<>', which while not
* listed as part of any operator family, we are able to
* handle the same if its negator is indeed a part of the
* partitioning operator family.
*/
if (!op_in_opfamily(saop_op, partopfamily))
{
Oid negator = get_negator(saop_op);
int strategy;
Oid lefttype,
righttype;if (!OidIsValid(negator))
continue;
get_op_opfamily_properties(negator, partopfamily, false,
&strategy,
&lefttype, &righttype);
if (strategy == BTEqualStrategyNumber)
negated = true;
}Since there's nothing to reject the clause that does not match the
partition key, the IN's left operand might be of any random type, and
may well not be in partopfamily, so when it comes to looking up
get_op_opfamily_properties() you'll hit: elog(ERROR, "operator %u is
not a member of opfamily %u", opno, opfamily);
Ah, I completely missed that. So we need something like the following in
this IsA(clause, ScalarArrayOpExpr) block:
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
Still looking at the v17 patch here, but I also don't see a test to
see if the IsBooleanOpfamily(partopfamily) is checking it matches the
partition key.
You're right. Added checks there as well.
Can you point me to the email where Beena reported the problem in question?
/messages/by-id/CAOG9ApERiop7P=GRkqQKa82AuBKjxN3qVixie3WK4WqQpEjS6g@mail.gmail.com
To save you from having to look at the run-time prune patch, here's
case that break in v18.create table xy (a int, b text) partition by range (a,b);
create table xy1 partition of xy for values from (0,'a') to (10, 'b');
select * from xy where a = 1 and b in('x','y');
ERROR: operator 531 is not a member of opfamily 1976
You'll be able to see that the error no longer appears with the attached
updated set of patches, but I'm now seeing that the resulting plan with
patched for this particular query differs from what master (constraint
exclusion) produces. Master produces a plan with no partitions (as one
would think is the correct plan), whereas patched produces a plan
including the xy1 partition. I will think about that a bit and post
something later.
Thanks,
Amit
Attachments:
v19-0001-Some-interface-changes-for-partition_bound_-cmp-.patchtext/plain; charset=UTF-8; name=v19-0001-Some-interface-changes-for-partition_bound_-cmp-.patchDownload
From 505323bc4f745bd4525e46b0d511c44beb819f8c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH v19 1/5] Some interface changes for
partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 164 ++++++++++++++++++++++++++++++----------
1 file changed, 122 insertions(+), 42 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8adc4ee977..d937edcd83 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (is_bound is true in that case and either lbound or rbound is set), or a
+ * new tuple's partition key specified in datums (where ndatums = number of
+ * partition key columns specified in the query).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -985,6 +1011,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -999,8 +1027,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1073,10 +1107,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1127,6 +1167,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1148,8 +1189,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1163,9 +1207,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2537,12 +2581,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2569,11 +2616,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2845,12 +2896,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2872,11 +2923,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2885,25 +2936,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If there's no tuple datum to compare with the bound,
+ * consider the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2914,12 +2995,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2933,20 +3015,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg could either be a partition bound or a Datum array representing
+ * the partition key of a tuple being routed. We simply pass that down to
+ * partition_bound_cmp where it is interpreted appropriately.
*
* *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2959,8 +3040,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
v19-0002-Introduce-a-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v19-0002-Introduce-a-get_partitions_from_clauses.patchDownload
From 70cfc0a75f7ec35eeb45002c0f81f86a0574400d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v19 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of query clauses and
return a set of indexes of the partitions that satisfy all of those
clauses.
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Callers must have checked that each of the clauses matches one of the
partition keys.
---
src/backend/catalog/partition.c | 2013 ++++++++++++++++++++++++++++++++++
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/optimizer/clauses.h | 2 +
5 files changed, 2022 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d937edcd83..ac7953e589 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -163,6 +167,80 @@ typedef struct PartitionBoundCmpArg
int ndatums;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing the same in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioing operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+/*
+ * PartScanKeyInfo
+ * Bounding scan keys to look up a table's partitions obtained from
+ * mutually-ANDed clauses containing partitioning-compatible operators
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Constants constituting the *whole* partition key compared using
+ * partitioning-compatible equality operator(s). When n_eqkeys > 0, other
+ * keys (minkeys and maxkeys) are irrelevant.
+ *
+ * Equal keys are not required to be in any particular order, unlike the
+ * keys below which must appear in the same order as partition keys.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Constants that constitute the lower bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using > or >=
+ * operator compatible with partitioning, making this the lower bound in
+ * a range query.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ /*
+ * Constants that constitute the upper bound on the partition key or a
+ * prefix thereof. The last of those constants is compared using < or <=
+ * operator compatible with partitioning, making this the upper bound in
+ * a range query.
+ */
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Does the query specify a key to be null or not null? Partitioning
+ * handles null partition keys specially depending on the partitioning
+ * strategy in use, we store this information.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +289,35 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static Bitmapset *get_partitions_from_ne_clauses(Relation relation,
+ List *ne_clauses);
+static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
+ int rt_index, List *or_clause_args);
+static bool classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses, List **ne_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partkeyidx, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1581,9 +1688,1915 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_from_clauses
+ * Determine the set of partitions of 'relation' that will satisfy all
+ * the clauses contained in 'partclauses'
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ Bitmapset *result;
+ List *partconstr;
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+
+ Assert(partclauses != NIL);
+
+ /*
+ * If relation is a partition itself, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partition_bound_has_default(boundinfo) &&
+ (partconstr = RelationGetPartitionQual(relation)) != NIL)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ partclauses = list_concat(partclauses, partconstr);
+ }
+
+ result = get_partitions_from_clauses_recurse(relation, rt_index,
+ partclauses);
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_from_clauses_recurse
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ Bitmapset *result = NULL;
+ PartScanKeyInfo keys;
+ bool constfalse;
+ List *or_clauses,
+ *ne_clauses;
+ ListCell *lc;
+
+ /*
+ * Try to reduce the set of clauses into a form that
+ * get_partitions_for_keys() can work with.
+ */
+ if (classify_partition_bounding_keys(relation, clauses, rt_index,
+ &keys, &constfalse,
+ &or_clauses, &ne_clauses))
+ {
+ /*
+ * classify_partition_bounding_keys() may have found clauses marked
+ * pseudo-constant that are false that the planner didn't or it may
+ * have itself found contradictions among clauses.
+ */
+ if (constfalse)
+ return NULL;
+
+ result = get_partitions_for_keys(relation, &keys);
+ }
+ else
+ {
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ result = bms_add_range(result, 0, partdesc->nparts - 1);
+ }
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we got
+ * an empty set in the first place.
+ */
+ if (constfalse || bms_is_empty(result))
+ return NULL;
+
+ /* Select partitions by applying the clauses containing <> operators. */
+ if (ne_clauses)
+ {
+ Bitmapset *ne_clause_parts;
+
+ ne_clause_parts = get_partitions_from_ne_clauses(relation, ne_clauses);
+
+ /*
+ * Clauses in ne_clauses are in conjunction with the clauses that
+ * selected the partitions contained in result, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, ne_clause_parts);
+ bms_free(ne_clause_parts);
+ }
+
+ /* Select partitions by applying OR clauses. */
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_clause_args(relation, rt_index,
+ or->args);
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Assumes partkey exists in the scope and is of a list partitioned table. */
+#define partkey_datums_equal(d1, d2)\
+ (0 == DatumGetInt32(FunctionCall2Coll(&partkey->partsupfunc[0],\
+ partkey->partcollation[0],\
+ (d1), (d2))))
+/*
+ * Check if d is equal to some member of darray where equality is determined
+ * by the partitioning comparison function.
+ */
+static bool
+datum_in_array(PartitionKey partkey, Datum d, Datum *darray, int n)
+{
+ int i;
+
+ if (darray == NULL || n == 0)
+ return false;
+
+ for (i = 0; i < n; i++)
+ if (partkey_datums_equal(d, darray[i]))
+ return true;
+
+ return false;
+}
+
+/*
+ * count_partition_datums
+ *
+ * Returns the number of non-null datums allowed by a non-default list
+ * partition with given index.
+ */
+static int
+count_partition_datums(Relation rel, int index)
+{
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ int i,
+ result = 0;
+
+ Assert(index != boundinfo->default_index);
+
+ /*
+ * The answer is as many as the count of occurrence of the value index
+ * in boundinfo->indexes[].
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ if (index == boundinfo->indexes[i])
+ result += 1;
+
+ return result;
+}
+
+/*
+ * get_partitions_from_ne_clauses
+ *
+ * Return partitions of relation that satisfy all <> operator clauses in
+ * ne_clauses. Only ever called if relation is a list partitioned table.
+ */
+static Bitmapset *
+get_partitions_from_ne_clauses(Relation relation, List *ne_clauses)
+{
+ ListCell *lc;
+ Bitmapset *result,
+ *excluded_parts;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ Datum *exclude_datums;
+ int *count_excluded,
+ n_exclude_datums,
+ i;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+
+ /*
+ * How this works:
+ *
+ * For each constant expression, we look up the partition that would
+ * contain its value and mark the same as excluded partition. After
+ * doing the same for all clauses we'll have set of partitions that
+ * are excluded. For each excluded partition, check if there exist
+ * values that it allows but are not specified in the clauses, if so
+ * the partition won't actually be excluded.
+ */
+
+ /* De-duplicate constant values. */
+ exclude_datums = (Datum *) palloc0(list_length(ne_clauses) *
+ sizeof(Datum));
+ n_exclude_datums = 0;
+ foreach(lc, ne_clauses)
+ {
+ PartClause *pc = lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(partkey, 0, pc->constarg, &datum) &&
+ !datum_in_array(partkey, datum, exclude_datums, n_exclude_datums))
+ exclude_datums[n_exclude_datums++] = datum;
+ }
+
+ /*
+ * For each value, if it's found in boundinfo, increment the count of its
+ * partition as excluded due to that value.
+ */
+ count_excluded = (int *) palloc0(partdesc->nparts * sizeof(int));
+ for (i = 0; i < n_exclude_datums; i++)
+ {
+ int offset,
+ excluded_part;
+ bool is_equal;
+ PartitionBoundCmpArg arg;
+ Datum argdatums[] = {exclude_datums[i]};
+
+ memset(&arg, 0, sizeof(arg));
+ arg.datums = argdatums;
+ arg.ndatums = 1;
+ offset = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+ if (offset >= 0 && is_equal && boundinfo->indexes[offset] >= 0)
+ {
+ excluded_part = boundinfo->indexes[offset];
+ count_excluded[excluded_part]++;
+ }
+ }
+
+ excluded_parts = NULL;
+ for (i = 0; i < partdesc->nparts; i++)
+ {
+ /*
+ * If all datums of this partition appeared in ne_clauses, exclude
+ * this partition.
+ */
+ if (count_excluded[i] > 0 &&
+ count_excluded[i] == count_partition_datums(relation, i))
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Also, exclude the "null-only" partition, because strict clauses in
+ * ne_clauses will not select any rows from it.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ count_partition_datums(relation, boundinfo->null_index) == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(count_excluded);
+ pfree(exclude_datums);
+
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ result = bms_del_members(result, excluded_parts);
+ bms_free(excluded_parts);
+
+ return result;
+}
+
+/*
+ * get_partitions_from_or_clause_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_clause_args.
+ */
+static Bitmapset *
+get_partitions_from_or_clause_args(Relation relation, int rt_index,
+ List *or_clause_args)
+{
+ ListCell *lc;
+ Bitmapset *result = NULL;
+
+ foreach(lc, or_clause_args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation
+ * due to the latter's partition constraint, which means we must
+ * not add its partitions to or_partset. But the clause may not
+ * contain this relation's partition key expressions (instead the
+ * parent's), so we could not depend on just calling
+ * get_partitions_from_clauses_recurse(relation, ...) to determine
+ * that the clause indeed prunes all of the relation's partition.
+ *
+ * Use predicate refutation proof instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation, rt_index,
+ arg_clauses);
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ?\
+ (IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) :\
+ equal((expr), (partexpr)))
+
+/*
+ * classify_partition_bounding_keys
+ * Classify partition clauses into equal, min, and max keys, along with
+ * any Nullness constraints and return that information in the output
+ * argument 'keys' (Returns true if 'keys' contains valid information
+ * upon return, otherwise false.)
+ *
+ * Clauses in the provided list are implicitly ANDed, each of which is known
+ * to match some partition key column. Map them to individual key columns
+ * and for each column, determine the equal bound or "best" min and max
+ * bounds. For example, of a > 1, a > 2, and a >= 5, "5" is the best min
+ * bound for the column a, which also happens to be an inclusive bound.
+ * When analyzing multiple clauses referencing the same key, it is checked
+ * if there are mutually contradictory clauses and if so, we set *constfalse
+ * to true to indicate to the caller that the set of clauses cannot be true
+ * for any partition. It is also set if the list already contains a
+ * pseudo-constant clause.
+ *
+ * For multi-column keys, an equal bound is returned only if all the columns
+ * are constrained by clauses containing equality operators, unless hash
+ * partitioning is in use, in which case, it's possible that some keys have
+ * IS NULL clauses while remaining have clauses with equality operators.
+ * Min and max bounds could contain bound values for only a prefix of keys.
+ *
+ * All the OR clauses encountered in the list and those generated from certain
+ * ScalarArrayOpExprs are added to *or_clauses. It's the responsibility of the
+ * caller to process the argument clauses of each of the OR clauses, which
+ * would involve recursively calling this function.
+ *
+ * Clauses containing a <> operator are added to *ne_clauses, provided its
+ * negator is a valid partitioning equality operator and that too only if
+ * list partitioning is in use.
+ */
+static bool
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses,
+ List **ne_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool will_compute_keys = false;
+ Bitmapset *keyisnull = NULL,
+ *keyisnotnull = NULL;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *ne_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, sizeof(keyclauses_all));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ continue;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ /*
+ * A non-zero partattno refers to a simple column reference that
+ * will be matched against varattno of a Var appearing the clause.
+ * partattno == 0 refers to arbitrary expressions, which get the
+ * current one from PartitionKey.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ /* Copy to avoid overwriting the relcache's content. */
+ partexpr = copyObject(lfirst(partexprs_item));
+
+ /*
+ * Expressions stored in PartitionKey in the relcache all
+ * contain a dummy varno (that is, 1), but we must switch to
+ * the RT index of the table in this query so that it can be
+ * correctly matched to the expressions coming from the query.
+ */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ constexpr = leftop;
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering opertors like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ int strategy;
+ Oid negator,
+ lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber &&
+ partkey->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * Flip the left and right args if we have to, because the
+ * code which extract the constant value to use for
+ * partition-pruning expects to find it as the rightop of the
+ * clause. (See below in this function.)
+ */
+ if (constexpr == rightop)
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+ Oid commutator = get_commutator(opclause->opno);
+
+ /*
+ * Caller must have made sure to check that the commutator
+ * indeed exists.
+ */
+ Assert(OidIsValid(commutator));
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = commutator;
+ commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_from_ne_clauses().
+ */
+ if (is_ne_listp)
+ *ne_clauses = lappend(*ne_clauses, pc);
+ else
+ {
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+
+ /*
+ * Since we only allow strict operators, require keys to
+ * be not null.
+ */
+ keyisnotnull = bms_add_member(keyisnotnull, i);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = linitial(saop->args),
+ *rightop = lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /* Clause does not match this partition key. */
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle the same if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (EXPR_MATCHES_PARTKEY(arg, partattno, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ keyisnull = bms_add_member(keyisnull, i);
+ else
+ keyisnotnull = bms_add_member(keyisnotnull, i);
+ n_keynullness++;
+ will_compute_keys = true;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (!will_compute_keys || *constfalse)
+ return 0;
+
+ /*
+ * Try to eliminate redundant keys. In the process, we might find out
+ * that clauses are mutually contradictory and hence can never be true
+ * for any rows.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i], &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return 0;
+ }
+
+ /*
+ * Generate bounding tuple(s).
+ *
+ * We look up partitions in the partition bound descriptor using, say,
+ * partition_bound_bsearch(), which expects a Datum (or Datums if multi-
+ * column key). So, extract the same out of the constant argument of
+ * each clause.
+ *
+ * Further, based on the strategies of clause operators (=, </<=, >/>=),
+ * try to construct tuples out of those datums that serve as the exact
+ * look-up tuple or minimum/maximum bounding tuple(s). If we find datums
+ * for all partition key columns that appear in = operator clauses, then
+ * we have the look-up tuple to be exactly matched, which will return just
+ * one partition if one exists. If the last value of the tuple comes from
+ * a </<= or >/>= operator, then that constitutes the minimum and maximum
+ * bounding tuple, respectively. There is one exception -- if the tuple
+ * constitutes a proper prefix of partition key columns, with none of its
+ * values coming from a </<= or >/>= operator, we consider such tuple both
+ * the minimum and maximum bounding tuple. For a multi-column range
+ * partitioned table, there usually exists a sequence of consecutive
+ * partitions that share a prefix of partition bound, which are all
+ * matched by a bounding tuple of the aforementioned shape.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing equality
+ * operators for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = keyisnull;
+ keys->keyisnotnull = keyisnotnull;
+
+ return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'op' contains an =, </<=, or >/>=
+ * operator and sets *incl if equality is implied
+ */
+static PartOpStrategy
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ PartOpStrategy result;
+
+ *incl = false; /* overwritten as appropriate below */
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = PART_OP_EQUAL;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ result = PART_OP_LESS;
+ break;
+ case BTEqualStrategyNumber:
+ *incl = true;
+ result = PART_OP_EQUAL;
+ break;
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ result = PART_OP_GREATER;
+ break;
+ }
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Extract constant value from expr and set *datum to that value
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partkeyidx])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partkeyidx], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * For a given partition key column, find the most restrictive of the clauses
+ * contained in all_clauses that are known to match the column. If in the
+ * process, it is found that two clauses are mutually contradictory, we simply
+ * stop, set *constfalse to true, and return.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey, int partkeyidx,
+ List *all_clauses, List **result,
+ bool *constfalse)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ hash_clause = NULL;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[partkeyidx],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've matched
+ * a clause and found another whose constant operand doesn't match
+ * the constant operand of the former, we have a case of mutually
+ * contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, partkeyidx,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with the same. It's possible that mutual
+ * contradiction is proved at some higher level, but it's just
+ * that we couldn't do so here.
+ */
+ else
+ *result = lappend(*result, cur);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points to the currently best scan key of strategy
+ * type s+1; it is NULL if we haven't yet found such a key for this
+ * attr.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, partkeyidx,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, replace old key. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+
+ /* The old key is more restrictive, keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ *result = lappend(*result, hash_clause);
+ return;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equal key with keys of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq key is
+ * a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq key is a = 3, then because 3 < 5, we no longer need a < 5,
+ * because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, partkeyidx,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* discard the redundant key. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partkeyidx,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partkeyidx,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the result.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ if (btree_clauses[s])
+ *result = lappend(*result, btree_clauses[s]);
+}
+
+/*
+ * Evaluate 'leftarg op rightarg' and set *result to its value.
+ *
+ * leftarg and rightarg referred to above actually refer to the constant
+ * operand (Datum) of the clause contained in the parameters leftarg and
+ * rightarg below, respectively. And op refers to the operator of the
+ * clause contained in the parameter op below.
+ *
+ * Returns true if we could actually perform the evaluation. False is
+ * returned otherwise, that is, in cases where we couldn't perform the
+ * evaluation for reasons such as operands values being unavailable or
+ * types of operands being incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partkeyidx];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * If the leftarg_const and rightarg_const are both of the type expected
+ * by op's operator, then compare them using the latter.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions that will need to be scanned for the given
+ * bounding keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is received here.
+ *
+ * Outputs:
+ * Partition set satisfying the keys.
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ return NULL; /* keep compiler quiet */
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor
+ * using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Hash partitioning stores partition keys containing nulls in regular
+ * partitions. That is, the code that determines the hash partition for
+ * a given row admits nulls in the partition key when computing the key's
+ * hash. So, here we treat any IS NULL clauses on partition key columns as
+ * equality keys, along with any other non-null values coming from equality
+ * operator clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+ Assert(partkey->partnatts == 1);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ int other_idx = -1;
+
+ /*
+ * Only a designated partition accepts nulls, which if there
+ * exists one, return the same.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) ||
+ partition_bound_has_default(boundinfo))
+ other_idx = partition_bound_accepts_nulls(boundinfo)
+ ? boundinfo->null_index
+ : boundinfo->default_index;
+ if (other_idx >= 0)
+ return bms_make_singleton(other_idx);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * minoff set to -1 means all datums are greater than minkeys, which
+ * means all partitions satisfy minkeys. In that case, set minoff to
+ * the index of the leftmost datum, viz. 0.
+ *
+ * If the bound at minoff doesn't exactly match minkey or if it does,
+ * but minkey isn't inclusive, move to the bound on the right.
+ */
+ if (minoff == -1 || !is_equal || !keys->min_incl)
+ minoff++;
+
+ /*
+ * boundinfo->ndatums - 1 is the last valid list partition datums
+ * index.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ minoff = -1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * Unlike minoff, we leave maxoff that is set to -1 unchanged, because
+ * it simply means none of the partitions satisfies maxkeys.
+ *
+ * If the bound at maxoff exactly matches maxkey (is_equal), but the
+ * maxkey is not inclusive, then go to the bound on left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some datums
+ * (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ Bitmapset *result = NULL;
+
+ /*
+ * All datums between those at minoff and maxoff satisfy the query
+ * keys, so add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a ranget partitioned table for requested keys
+ *
+ * This interprets the keys and looks up the partition bound descriptor using
+ * the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there exist
+ * partitions, it must be the default partition.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partkey->partnatts);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering the same as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * If only a prefix of the whole partition key is provided, there may
+ * be multiple partitions whose bound share the same prefix. If minkey
+ * is inclusive, we must make minoff point to the leftmost such bound,
+ * making the result contain all such partitions. If it is exclusive,
+ * we must move minoff to the right such that minoff points to the
+ * first partition whose bound is greater than this prefix, thus
+ * excluding all aforementioned partitions from appearing in the
+ * result.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ if (minoff < 0 || minoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, minoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (!keys->min_incl)
+ minoff -= 1;
+ }
+
+ /*
+ * The bound at minoff is known to be <= minkeys, given the way
+ * partition_bound_bsearch works. Considering the same as the lower
+ * bound of the partition that minkeys falls into, the bound at
+ * minoff + 1 would be its upper bound, so use minoff + 1 to get that
+ * partition's index.
+ *
+ * Note that that's the leftmost partition in the sequence of
+ * partitions that all satisfy minkeys.
+ */
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, i.e., one that
+ * satisfies maxkeys.
+ *
+ * Note that There is 1 more index than there are partition datums in the
+ * range partitioning case.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /* See the comment above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ int32 cmpval;
+
+ is_equal = false;
+ do
+ {
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ if (maxoff < 0 || maxoff >= boundinfo->ndatums)
+ break;
+ cmpval = partition_bound_cmp(partkey, boundinfo, maxoff,
+ &arg);
+ } while (cmpval == 0);
+
+ /* Back up if went too far. */
+ if (keys->max_incl)
+ maxoff -= 1;
+ }
+
+ /*
+ * The bound at maxoff is known to be <= maxkeys, given the way
+ * partition_bound_bsearch works. Considering the same as the lower
+ * bound of the partition that maxkeys falls into, the bound at
+ * maxoff + 1 would be its upper bound, so use maxoff + 1 to get that
+ * partition's index. Although, if the bound is equal to maxkeys and
+ * the latter is not inclusive, then the bound at maxoff itself is the
+ * upper bound of the partition that maxkeys falls into.
+ *
+ * Note that that's the rightmost partition in the sequence of
+ * partitions that all satisfy maxkeys.
+ */
+ if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ /*
+ * minoff or maxoff set to -1 means none of the datums in
+ * PartitionBoundInfo satisfies both minkeys and maxkeys. If both are set
+ * to a valid datum offset, that means there exists at least some
+ * datums (and hence partitions) satisfying both minkeys and maxkeys.
+ */
+ if (minoff >= 0 && maxoff >= 0)
+ {
+ bool include_def = false;
+ Bitmapset *result = NULL;
+
+
+ /*
+ * If the bound at minoff or maxoff looks like it's an upper bound of
+ * an unassigned range of values, move to the adjacent bound which must
+ * be the upper bound of the leftmost or rightmost partition,
+ * respectively, that needs to be scanned.
+ *
+ * By doing that, we skip over a portion of values that do indeed
+ * satisfy the query, but don't have a valid partition assigned. The
+ * default partition will have to be included to cover those values.
+ * Although, if the original bound in question is an infinite value,
+ * there would not be any unassigned range to speak of, because the
+ * range is unbounded in that direction by definition, so no need to
+ * include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ {
+ include_def = true;
+ }
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There might exist a range of values unassigned to any non-default
+ * range partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+ else
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+
+ Assert(false);
+ return NULL;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index cf38b4eb5e..ccfae4f31e 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..8423c6e886 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -73,4 +73,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
v19-0003-Move-some-code-of-set_append_rel_size-to-separat.patchtext/plain; charset=UTF-8; name=v19-0003-Move-some-code-of-set_append_rel_size-to-separat.patchDownload
From 9648fe4dba7c49b761eaa32f3e053510efbb9288 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH v19 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 12a6ee4a22..f5c11a17cf 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ac5a7c9553..35345ccbe9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 725694f570..9b4288ad92 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -300,5 +300,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
v19-0004-More-refactoring-around-partitioned-table-Append.patchtext/plain; charset=UTF-8; name=v19-0004-More-refactoring-around-partitioned-table-Append.patchDownload
From b144c9bce081550a376cc97128093203bbdacb01 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v19 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 120 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 19 ++++--
src/backend/optimizer/util/relnode.c | 10 +++
src/include/nodes/relation.h | 22 ++++++-
4 files changed, 115 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f5c11a17cf..83035a3e8d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1207,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1299,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1349,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ list_copy(childrel->live_partitioned_rels));
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7b52dadd81..b0f6051618 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6189,14 +6189,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 35345ccbe9..4b5d50eb2c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +743,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 71689b8ed6..63623f2687 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,6 +529,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +658,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
v19-0005-Teach-planner-to-use-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v19-0005-Teach-planner-to-use-get_partitions_from_clauses.patchDownload
From b0013377907f57dee9ee609f0549b3d135a7d2b8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH v19 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using the information in its
PartitionScheme. Further, if we pass the set of matched clauses to
get_partitions_from_clauses(), we get the set of matching partitions
in (hopefully) less time than determining the same by running the
matching algorithm separately for each partition.
Authors: Amit Langote, Dilip Kumar
---
src/backend/optimizer/path/allpaths.c | 399 ++++++++++++++++++++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 41 ++-
src/include/nodes/relation.h | 7 +-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition_prune.out | 442 ++++++++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 74 ++++-
7 files changed, 917 insertions(+), 78 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 83035a3e8d..c9c998bd34 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,8 +20,10 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
@@ -136,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -847,6 +857,390 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * in rel->baserestrictinfo. An empty list is returned if no matching
+ * partitions were found.
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *partclauses;
+ bool contains_const,
+ constfalse;
+ List *result = NIL;
+ int i;
+ Relation parent;
+ PartitionDesc partdesc;
+ Bitmapset *partindexes;
+
+ /*
+ * Get the clauses that match the partition key. It's also a good idea
+ * to check if the matched clauses contain constant values that can be
+ * used for pruning and go to get_partitions_from_clauses() only if so.
+ * If rel->baserestrictinfo might contain mutually contradictory clauses,
+ * also find out about that.
+ */
+ partclauses = match_clauses_to_partkey(root, rel, rel->baserestrictinfo,
+ &contains_const, &constfalse);
+
+ /* We're done here. */
+ if (constfalse)
+ return NIL;
+
+ parent = heap_open(rte->relid, NoLock);
+ partdesc = RelationGetPartitionDesc(parent);
+
+ if (partclauses != NIL && contains_const)
+ partindexes = get_partitions_from_clauses(parent, rel->relid,
+ partclauses);
+ else
+ {
+ /*
+ * There are no clauses that are useful to prune any partitions, so
+ * scan all partitions.
+ */
+ partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * Returned list contains clauses matched to the partition key columns and
+ * *contains_const and *constfalse are set as described below.
+ *
+ * For an individual clause to match with a partition key column, the clause
+ * must be an operator clause of the form (partkey op const) or (const op
+ * partkey); the latter only if a suitable commutator exists. Furthermore,
+ * the operator must be strict and its input collation must match the partition
+ * collation. The aforementioned "const" means any expression that doesn't
+ * involve a volatile function or a Var of this relation. We allow Vars
+ * belonging to other relations (for example, if the clause is a join clause),
+ * but they are treated as parameters whose values are not known now, so cannot
+ * be used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join clauses
+ * appropriately. If a NullTest against a partition key is encountered, it's
+ * added to the result as well.
+ *
+ * *contains_const is set if at least one matched clauses contains the constant
+ * operand or is a Nullness test. *constfalse is set if the input list
+ * contains a pseudo-constant RestrictInfo with false value.
+ */
+static List *
+match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ /* Make a copy, because we may scribble on it below. */
+ clauses = list_copy(clauses);
+
+ foreach(lc, clauses)
+ {
+ Node *member = lfirst(lc);
+ Expr *clause;
+ int i;
+
+ if (IsA(member, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) member;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+ else
+ clause = (Expr *) member;
+
+ /*
+ * For a BoolExpr, we should try to match each of its args with the
+ * partition key as described below for each type.
+ */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ /*
+ * For each of OR clause's args, call this function
+ * recursively with a given arg as the only member in the
+ * input list and see if it's returned as matching the
+ * partition key. Add the OR clause to the result iff at
+ * least one of its args contain a matching clause.
+ */
+ BoolExpr *orclause = (BoolExpr *) clause;
+ ListCell *lc1;
+ bool arg_matches_key = false,
+ matched_arg_contains_const = false,
+ all_args_constfalse = true;
+
+ foreach (lc1, orclause->args)
+ {
+ Node *arg = lfirst(lc1);
+ bool contains_const1,
+ constfalse1;
+
+ if (match_clauses_to_partkey(root, rel, list_make1(arg),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ arg_matches_key = true;
+ matched_arg_contains_const = contains_const1;
+ }
+
+ /* We got at least one arg that is not constant false. */
+ if (!constfalse1)
+ all_args_constfalse = false;
+ }
+
+ if (arg_matches_key)
+ {
+ result = lappend(result, clause);
+ *contains_const = matched_arg_contains_const;
+ }
+
+ /* OR clause is "constant false" if all of its args are. */
+ *constfalse = all_args_constfalse;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Since the clause is itself implicitly ANDed with other
+ * clauses in the input list, queue the args to be processed
+ * later as if they were part of the original input list.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the clauses matches the partition key and add it to
+ * the result list if other things such as operator input
+ * collation, strictness, etc. look fine.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = pull_varnos((Node *) rightop);
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = pull_varnos((Node *) leftop);
+ expr_op = get_commutator(expr_op);
+
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning.
+ */
+ result = lappend(result, clause);
+
+ if (!*contains_const)
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* OK to add to the result. */
+ result = lappend(result, clause);
+ if (IsA(eval_const_expressions(root, rightop), Const))
+ *contains_const = true;
+ else
+ *contains_const = false;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* A Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Certain Boolean conditions have a special shape, which we
+ * accept if the partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ /*
+ * Only accept those for pruning that appear to be
+ * IS [NOT] TRUE/FALSE.
+ */
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+ Expr *arg = btest->arg;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN &&
+ equal((Node *) arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var))
+ {
+ if (equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ }
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +1282,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 1d152c514e..d9249f4c33 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1397,6 +1397,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8c60b35068..c103deb21b 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1247,22 +1246,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1920,6 +1929,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 63623f2687..855d51ea09 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..2072766efd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..83e60814f7 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1067,363 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..13b12078bf 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,76 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
On 12 January 2018 at 15:27, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/01/11 19:23, David Rowley wrote:
ERROR: operator 531 is not a member of opfamily 1976
You'll be able to see that the error no longer appears with the attached
updated set of patches, but I'm now seeing that the resulting plan with
patched for this particular query differs from what master (constraint
exclusion) produces. Master produces a plan with no partitions (as one
would think is the correct plan), whereas patched produces a plan
including the xy1 partition. I will think about that a bit and post
something later.
Thanks for looking at that.
I've got a few more things for you. I'm only partway through another
pass, but it makes sense to post what I have now if you're working on
a new version.
1. partitioing -> partitioning
* Strategy of a partition clause operator per the partitioing operator class
2. get_partitions_from_clauses() modifies partclauses without
mentioning it in the header. I think you need to either:
a) warn about this in the header comment; or
b) do a list_copy() before list_concat()
c) do list_truncate back to the original length after you're done with the list.
3. get_partitions_from_clauses_recurse(), with:
result = bms_add_range(result, 0, partdesc->nparts - 1);
You could change that to bms_add_range(NULL, ...) and ditch the
assignment of result to NULL at the start of the function.
4. classify_partition_bounding_keys() now returns bool, but the return
statement is still:
return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
my compiler didn't warn about that, but I'd imagine some might.
Instead, can you make it:
if (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
keys->n_maxkeys > 0 || n_keynullness > 0)
return true;
return false;
probably equal keys are the most likely case, so it'll be good to
short circuit instead of performing addition on a bunch of stuff we
don't care about anymore.
5. In classify_partition_bounding_keys, why do we "continue" here?
clause = rinfo->clause;
if (rinfo->pseudoconstant &&
!DatumGetBool(((Const *) clause)->constvalue))
{
*constfalse = true;
continue;
}
Is there any point in searching further?
Also, if you were consistent with the return value for
classify_partition_bounding_keys when you've set *constfalse = true;
you wouldn't need to handle the case twice like you are in
get_partitions_from_clauses_recurse().
6. I think it would be nicer if get_partitions_from_ne_clauses returns
a set of partitions that could be excluded.
So instead of:
* get_partitions_from_ne_clauses
*
* Return partitions of relation that satisfy all <> operator clauses in
* ne_clauses. Only ever called if relation is a list partitioned table.
Have:
* get_partitions_from_ne_clauses
*
* Returns a Bitmapset of partitions that can be safely excluded due to
* not-equal clauses existing for all possible partition values. It is only
* valid to call this for LIST partitioned tables.
and instead of:
result = bms_add_range(NULL, 0, partdesc->nparts - 1);
result = bms_del_members(result, excluded_parts);
bms_free(excluded_parts);
return result;
Just do:
return excluded_parts;
and in get_partitions_from_clauses_recurse(), do bms_del_members
instead of bms_int_members.
there's less bit shuffling and it seems cleaner. Perhaps the function
name would need to be changed if we're inverting the meaning too.
(I've attached a patch which makes this change along with an idea in #8 below)
7. The following comment claims the function sets *datum, but there's
no param by that name:
/*
* partkey_datum_from_expr
* Extract constant value from expr and set *datum to that value
*/
static bool
partkey_datum_from_expr(PartitionKey key, int partkeyidx,
Expr *expr, Datum *value)
8. The code in get_partitions_from_ne_clauses() does perform quite a
few nested loops. I think a more simple way to would be to track the
offsets you've seen in a Bitmapset. This would save you having to
check for duplicates, as an offset can only contain a single datum.
You'd just need to build a couple of arrays after that, one to sum up
the offsets found per partition, and one for the total datums allowed
in the partition. If the numbers match then you can remove the
partition.
I've written this and attached it to this email. It saves about 50
lines of code and should perform much better for complex cases, for
example, a large NOT IN list. This also implements #6.
9. "the same" -> "it"
/*
* In case of NOT IN (..), we get a '<>', which while not
* listed as part of any operator family, we are able to
* handle the same if its negator is indeed a part of the
* partitioning operator family.
*/
10. in classify_partition_bounding_keys: "0" -> "false"
/* Return if no work to do below. */
if (!will_compute_keys || *constfalse)
return 0;
Likewise for:
if (*constfalse)
return 0;
11. I don't see partition_bound_bsearch used anywhere below the
following comment:
* Generate bounding tuple(s).
*
* We look up partitions in the partition bound descriptor using, say,
* partition_bound_bsearch(), which expects a Datum (or Datums if multi-
* column key). So, extract the same out of the constant argument of
* each clause.
I also don't know what the comment is trying to say.
12.
* operator and sets *incl if equality is implied
should be:
* operator and set *incl to true if the operator's strategy is inclusive.
13. What does "the same" mean in:
* and add this one directly to the result. Caller would
* arbitrarily choose one of the many and perform
* partition-pruning with the same. It's possible that mutual
I think you quite often use "the same" to mean "it". Can you change that?
14. Not sure what parameter you're talking about here.
* Evaluate 'leftarg op rightarg' and set *result to its value.
*
* leftarg and rightarg referred to above actually refer to the constant
* operand (Datum) of the clause contained in the parameters leftarg and
* rightarg below, respectively. And op refers to the operator of the
* clause contained in the parameter op below.
15. "the latter" is normally used when you're referring to the last
thing in a list which was just mentioned. In this case, leftarg_const
and rightarg_const is the list, so "the latter" should mean
rightarg_const, but I think you mean to compare them using the
operator.
* If the leftarg_const and rightarg_const are both of the type expected
* by op's operator, then compare them using the latter.
16. There are a few things to improve with the following comment:
/*
* Hash partitioning stores partition keys containing nulls in regular
* partitions. That is, the code that determines the hash partition for
* a given row admits nulls in the partition key when computing the key's
* hash. So, here we treat any IS NULL clauses on partition key columns as
* equality keys, along with any other non-null values coming from equality
* operator clauses.
*/
"admits" is not the correct word here, and "hash" should be "correct",
but there are more mistakes, so might be easier just to rewrite to:
/*
* Since tuples with NULL values in the partition key columns are
stored in regular partitions,
* we'll treat any IS NULL clauses here as regular equality clauses.
/*
17. The following example will cause get_partitions_for_keys_hash to misbehave:
create table hashp (a int, b int) partition by hash (a, b);
create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
create table hashp3 partition of hashp for values with (modulus 4, remainder 3);
create table hashp4 partition of hashp for values with (modulus 4, remainder 2);
explain select * from hashp where a = 1 and a is null;
The following code assumes that you'll never get a NULL test for a key
that has an equality test, and ends up trying to prune partitions
thinking we got compatible clauses for both partition keys.
memset(keyisnull, false, sizeof(keyisnull));
for (i = 0; i < partkey->partnatts; i++)
{
if (bms_is_member(i, keys->keyisnull))
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
}
/*
* Can only do pruning if we know all the keys and they're all equality
* keys including the nulls that we just counted above.
*/
if (keys->n_eqkeys == partkey->partnatts)
The above code will need to be made smarter. It'll likely crash if you
change "b" to a pass-by-ref type.
18. The following code:
int other_idx = -1;
/*
* Only a designated partition accepts nulls, which if there
* exists one, return the same.
*/
if (partition_bound_accepts_nulls(boundinfo) ||
partition_bound_has_default(boundinfo))
other_idx = partition_bound_accepts_nulls(boundinfo)
? boundinfo->null_index
: boundinfo->default_index;
if (other_idx >= 0)
return bms_make_singleton(other_idx);
else
return NULL;
should be simplified to:
/*
* NULLs may only exist in the NULL partition, or in the
* default, if there's no NULL partition.
*/
if (partition_bound_accepts_nulls(boundinfo))
return bms_make_singleton(boundinfo->null_index);
else if (partition_bound_has_default(boundinfo))
return bms_make_singleton(boundinfo->default_index);
return NULL;
19. "exists" -> "are"
* If there are no datums to compare keys with, but there exist
* partitions, it must be the default partition.
also, instead of writing "it must be the default partition." it should
be better to say "just return the default partition."
20. I don't think the return NULL should ever hit, is it worth putting
a comment to say /* shouldn't happen */
if (boundinfo->ndatums == 0)
{
if (partition_bound_has_default(boundinfo))
return bms_make_singleton(boundinfo->default_index);
else
return NULL;
}
21. Can the following comment does not explain the situation well:
/*
* boundinfo->ndatums - 1 is the last valid list partition datums
* index.
*/
There's really no possible non-default partition for this case, so
perhaps we should just return the default, if one exists. We do go on
to check the n_maxkeys needlessly for this case. At the very least the
comment should be changed to:
/*
* minkeys values are greater than any non-default partition.
* We'll check that for case below.
*/
but I think it's worth just doing the default partition check there
and returning it, or NULL. It should help reduce confusion.
Can you also perform a self-review of the patch? Some of the things
I'm picking up are leftovers from a previous version of the patch. We
might never get through this review if you keep leaving those around!
I won't continue reviewing again until next week, so don't rush.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
get_partitions_excluded_by.patchapplication/octet-stream; name=get_partitions_excluded_by.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 0d9c774..57ea4a7 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -291,8 +291,8 @@ PG_FUNCTION_INFO_V1(satisfies_hash_partition);
static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
int rt_index, List *clauses);
-static Bitmapset *get_partitions_from_ne_clauses(Relation relation,
- List *ne_clauses);
+static Bitmapset *get_partitions_excluded_by(Relation relation,
+ List *ne_clauses);
static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
int rt_index, List *or_clause_args);
static bool classify_partition_bounding_keys(Relation relation, List *clauses,
@@ -1787,15 +1787,10 @@ get_partitions_from_clauses_recurse(Relation relation, int rt_index,
{
Bitmapset *ne_clause_parts;
- ne_clause_parts = get_partitions_from_ne_clauses(relation, ne_clauses);
+ ne_clause_parts = get_partitions_excluded_by(relation, ne_clauses);
- /*
- * Clauses in ne_clauses are in conjunction with the clauses that
- * selected the partitions contained in result, so combine the
- * partitions thus selected with those in result using set
- * intersection.
- */
- result = bms_int_members(result, ne_clause_parts);
+ /* Remove any matched partitions */
+ result = bms_del_members(result, ne_clause_parts);
bms_free(ne_clause_parts);
}
@@ -1825,151 +1820,105 @@ get_partitions_from_clauses_recurse(Relation relation, int rt_index,
(0 == DatumGetInt32(FunctionCall2Coll(&partkey->partsupfunc[0],\
partkey->partcollation[0],\
(d1), (d2))))
-/*
- * Check if d is equal to some member of darray where equality is determined
- * by the partitioning comparison function.
- */
-static bool
-datum_in_array(PartitionKey partkey, Datum d, Datum *darray, int n)
-{
- int i;
-
- if (darray == NULL || n == 0)
- return false;
-
- for (i = 0; i < n; i++)
- if (partkey_datums_equal(d, darray[i]))
- return true;
-
- return false;
-}
-
-/*
- * count_partition_datums
- *
- * Returns the number of non-null datums allowed by a non-default list
- * partition with given index.
- */
-static int
-count_partition_datums(Relation rel, int index)
-{
- PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
- int i,
- result = 0;
-
- Assert(index != boundinfo->default_index);
-
- /*
- * The answer is as many as the count of occurrence of the value index
- * in boundinfo->indexes[].
- */
- for (i = 0; i < boundinfo->ndatums; i++)
- if (index == boundinfo->indexes[i])
- result += 1;
-
- return result;
-}
/*
- * get_partitions_from_ne_clauses
+ * get_partitions_excluded_by
*
- * Return partitions of relation that satisfy all <> operator clauses in
- * ne_clauses. Only ever called if relation is a list partitioned table.
+ * Returns a Bitmapset of partition indexes of any partition that can safely
+ * be removed due to 'ne_clauses' containing not-equal clauses for all
+ * possible values that the partition can contain.
*/
static Bitmapset *
-get_partitions_from_ne_clauses(Relation relation, List *ne_clauses)
+get_partitions_excluded_by(Relation relation, List *ne_clauses)
{
- ListCell *lc;
- Bitmapset *result,
- *excluded_parts;
- PartitionKey partkey = RelationGetPartitionKey(relation);
- PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ ListCell *lc;
+ Bitmapset *excluded_parts = NULL;
+ Bitmapset *foundoffsets = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
PartitionBoundInfo boundinfo = partdesc->boundinfo;
- Datum *exclude_datums;
- int *count_excluded,
- n_exclude_datums,
- i;
+ PartitionBoundCmpArg arg;
+ int *datums_in_part;
+ int *datums_found;
+ int i;
Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partkey->partnatts == 1);
- /*
- * How this works:
- *
- * For each constant expression, we look up the partition that would
- * contain its value and mark the same as excluded partition. After
- * doing the same for all clauses we'll have set of partitions that
- * are excluded. For each excluded partition, check if there exist
- * values that it allows but are not specified in the clauses, if so
- * the partition won't actually be excluded.
- */
+ memset(&arg, 0, sizeof(arg));
- /* De-duplicate constant values. */
- exclude_datums = (Datum *) palloc0(list_length(ne_clauses) *
- sizeof(Datum));
- n_exclude_datums = 0;
+ /* build a Bitmapset to record the offsets of all datums found */
foreach(lc, ne_clauses)
{
- PartClause *pc = lfirst(lc);
+ PartClause *pc = (PartClause *) lfirst(lc);
Datum datum;
- if (partkey_datum_from_expr(partkey, 0, pc->constarg, &datum) &&
- !datum_in_array(partkey, datum, exclude_datums, n_exclude_datums))
- exclude_datums[n_exclude_datums++] = datum;
+ if (partkey_datum_from_expr(partkey, 0, pc->constarg, &datum))
+ {
+ int offset;
+ bool is_equal;
+
+ arg.datums = &datum;
+ arg.ndatums = 1;
+ offset = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+
+ if (offset >= 0 && is_equal && boundinfo->indexes[offset] >= 0)
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
}
+ /* No partitions can be excluded if we found no valid offsets above */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
/*
- * For each value, if it's found in boundinfo, increment the count of its
- * partition as excluded due to that value.
+ * Since each list partition can have multiple values in the IN clause, we
+ * must ensure that we got all values in that clause before we can
+ * eliminate the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums we found, and another to record the number of datums permitted
+ * in each partition. Once we've counted all this, we can eliminate any
+ * partition where the number of datums found match the number of datums
+ * allowed in the partition.
*/
- count_excluded = (int *) palloc0(partdesc->nparts * sizeof(int));
- for (i = 0; i < n_exclude_datums; i++)
- {
- int offset,
- excluded_part;
- bool is_equal;
- PartitionBoundCmpArg arg;
- Datum argdatums[] = {exclude_datums[i]};
-
- memset(&arg, 0, sizeof(arg));
- arg.datums = argdatums;
- arg.ndatums = 1;
- offset = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
- if (offset >= 0 && is_equal && boundinfo->indexes[offset] >= 0)
- {
- excluded_part = boundinfo->indexes[offset];
- count_excluded[excluded_part]++;
- }
- }
+ datums_in_part = (int *) palloc0(sizeof(int) * partdesc->nparts);
+ datums_found = (int *) palloc0(sizeof(int) * partdesc->nparts);
- excluded_parts = NULL;
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /* Now, in a single pass of the partitions, count the datums it permits */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Compare the counts, eliminate any partition that we found clause for
+ * all possible values. We must be careful here not to include any default
+ * partition. In this case both arrays will contain zero, so we can just
+ * simply ensure we only eliminate when we found at least 1 datum.
+ */
for (i = 0; i < partdesc->nparts; i++)
{
- /*
- * If all datums of this partition appeared in ne_clauses, exclude
- * this partition.
- */
- if (count_excluded[i] > 0 &&
- count_excluded[i] == count_partition_datums(relation, i))
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
excluded_parts = bms_add_member(excluded_parts, i);
}
/*
- * Also, exclude the "null-only" partition, because strict clauses in
- * ne_clauses will not select any rows from it.
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition providing it does not also allow non-NULL values.
*/
if (partition_bound_accepts_nulls(boundinfo) &&
- count_partition_datums(relation, boundinfo->null_index) == 0)
+ datums_in_part[boundinfo->null_index] == 0)
excluded_parts = bms_add_member(excluded_parts,
boundinfo->null_index);
- pfree(count_excluded);
- pfree(exclude_datums);
-
- result = bms_add_range(NULL, 0, partdesc->nparts - 1);
- result = bms_del_members(result, excluded_parts);
- bms_free(excluded_parts);
+ pfree(datums_in_part);
+ pfree(datums_found);
- return result;
+ return excluded_parts;
}
/*
@@ -2251,7 +2200,7 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
/*
* We don't turn a <> operator clause into a key right away.
* Instead, the caller will hand over such clauses to
- * get_partitions_from_ne_clauses().
+ * get_partitions_excluded_by().
*/
if (is_ne_listp)
*ne_clauses = lappend(*ne_clauses, pc);
David,
On 2018/01/12 12:30, David Rowley wrote:
Can you also perform a self-review of the patch? Some of the things
I'm picking up are leftovers from a previous version of the patch. We
might never get through this review if you keep leaving those around!
Sorry, I will look more closely before posting the next version. I guess
I may have rushed a bit too much when posting the v18/v19 patches, partly
because it's been 3 weeks since v17 and I felt I needed to catch up
quickly given the activity on the run-time pruning thread which depends on
the patches here.
Thanks,
Amit
On 10 January 2018 at 17:18, David Rowley <david.rowley@2ndquadrant.com> wrote:
Basically, the changes to add_paths_to_append_rel() are causing
duplication in partition_rels.A test case is:
create table part (a int, b int) partition by list(a);
create table part1 partition of part for values in(1) partition by list (b);
create table part2 partition of part1 for values in(1);select * from part;
partition_rels ends up with 3 items in the list, but there's only 2
partitions here. The reason for this is that, since planning here is
recursively calling add_paths_to_append_rel, the list for part ends up
with itself and part1 in it, then since part1's list already contains
itself, per set_append_rel_size's "rel->live_partitioned_rels =
list_make1_int(rti);", then part1 ends up in the list twice.It would be nicer if you could use a RelIds for this, but you'd also
need some way to store the target partition relation since
nodeModifyTable.c does:/* The root table RT index is at the head of the partitioned_rels list */
if (node->partitioned_rels)
{
Index root_rti;
Oid root_oid;root_rti = linitial_int(node->partitioned_rels);
root_oid = getrelid(root_rti, estate->es_range_table);
rel = heap_open(root_oid, NoLock); /* locked by InitPlan */
}You could also fix it by instead of doing:
/*
* Accumulate the live partitioned children of this child, if it's
* itself partitioned rel.
*/
if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
childrel->live_partitioned_rels);do something along the lines of:
if (childrel->part_scheme)
{
ListCell *lc;
ListCell *start = lnext(list_head(childrel->live_partitioned_rels));for_each_cell(lc, start)
partitioned_rels = lappend_int(partitioned_rels,
lfirst_int(lc));
}Although it seems pretty fragile. It would probably be better to find
a nicer way of handling all this.
Hi Amit,
I also noticed earlier that this is still broken in v19.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi.
On 2018/01/12 18:09, David Rowley wrote:
On 10 January 2018 at 17:18, David Rowley <david.rowley@2ndquadrant.com> wrote:
Basically, the changes to add_paths_to_append_rel() are causing
duplication in partition_rels.
[ ... ]
I also noticed earlier that this is still broken in v19.
I cannot see the duplication here (with v19 + some local changes per your
latest review, although I had fixed the issue in v18).
create table part (a int, b int) partition by list(a);
create table part1 partition of part for values in (1) partition by list (b);
create table part2 partition of part1 for values in (1);
select * from part;
For the above query, I set a breakpoint all the way in ExecInitAppend() to
see what partitioned_rels list it ends up with and I see no duplication:
:partitioned_rels (i 1 3)
where 1 and 3 are RT indexes of part and part1, respectively.
With v17, you'd be able to see the duplication:
:partitioned_rels (i 1 3 3)
Let me confirm again if you were complaining exactly of this duplication?
That the RT index of part1 appears twice due to the bug I claim I fixed i
v18? Or something else?
Thanks,
Amit
On 12 January 2018 at 22:51, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/01/12 18:09, David Rowley wrote:
On 10 January 2018 at 17:18, David Rowley <david.rowley@2ndquadrant.com> wrote:
Basically, the changes to add_paths_to_append_rel() are causing
duplication in partition_rels.I also noticed earlier that this is still broken in v19.
I cannot see the duplication here (with v19 + some local changes per your
latest review, although I had fixed the issue in v18).
I may have made a mistake there. The code I expected to change didn't.
I meant to test the case again, but I got distracted just before I did
and came back a while later and forgot that I hadn't tested.
If you've tested my case and it works, then please don't look any
further. I will look when v20 is ready.
Sorry for the false alarm... I must've been trying to do too many
things at once :-(
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 11, 2018 at 10:30 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
Instead, can you make it:
if (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
keys->n_maxkeys > 0 || n_keynullness > 0)
return true;return false;
Or better yet:
return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
keys->n_maxkeys > 0 || n_keynullness > 0);
It's not really necessary to write if (some condition is true) return
true; return false when you can just write return (boolean-valued
condition).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi David.
Thanks for the review.
On 2018/01/12 12:30, David Rowley wrote:
I've got a few more things for you. I'm only partway through another
pass, but it makes sense to post what I have now if you're working on
a new version.1. partitioing -> partitioning
* Strategy of a partition clause operator per the partitioing operator class
Fixed.
2. get_partitions_from_clauses() modifies partclauses without
mentioning it in the header. I think you need to either:a) warn about this in the header comment; or
b) do a list_copy() before list_concat()
c) do list_truncate back to the original length after you're done with the list.
Went with (b).
3. get_partitions_from_clauses_recurse(), with:
result = bms_add_range(result, 0, partdesc->nparts - 1);
You could change that to bms_add_range(NULL, ...) and ditch the
assignment of result to NULL at the start of the function.
Done.
4. classify_partition_bounding_keys() now returns bool, but the return
statement is still:return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
my compiler didn't warn about that, but I'd imagine some might.
Oops, my bad.
Instead, can you make it:
if (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
keys->n_maxkeys > 0 || n_keynullness > 0)
return true;return false;
probably equal keys are the most likely case, so it'll be good to
short circuit instead of performing addition on a bunch of stuff we
don't care about anymore.
Changed it to what Robert suggested downthread.
5. In classify_partition_bounding_keys, why do we "continue" here?
clause = rinfo->clause;
if (rinfo->pseudoconstant &&
!DatumGetBool(((Const *) clause)->constvalue))
{
*constfalse = true;
continue;
}Is there any point in searching further?
Also, if you were consistent with the return value for
classify_partition_bounding_keys when you've set *constfalse = true;
you wouldn't need to handle the case twice like you are in
get_partitions_from_clauses_recurse().
OK, I made classify_partition_bounding_keys() return true whenever set
*constfalse to true.
6. I think it would be nicer if get_partitions_from_ne_clauses returns
a set of partitions that could be excluded.So instead of:
* get_partitions_from_ne_clauses
*
* Return partitions of relation that satisfy all <> operator clauses in
* ne_clauses. Only ever called if relation is a list partitioned table.Have:
* get_partitions_from_ne_clauses
*
* Returns a Bitmapset of partitions that can be safely excluded due to
* not-equal clauses existing for all possible partition values. It is only
* valid to call this for LIST partitioned tables.and instead of:
result = bms_add_range(NULL, 0, partdesc->nparts - 1);
result = bms_del_members(result, excluded_parts);
bms_free(excluded_parts);return result;
Just do:
return excluded_parts;
and in get_partitions_from_clauses_recurse(), do bms_del_members
instead of bms_int_members.there's less bit shuffling and it seems cleaner. Perhaps the function
name would need to be changed if we're inverting the meaning too.(I've attached a patch which makes this change along with an idea in #8 below)
Thanks for the suggestions... (comment continues below)
7. The following comment claims the function sets *datum, but there's
no param by that name:/*
* partkey_datum_from_expr
* Extract constant value from expr and set *datum to that value
*/
static bool
partkey_datum_from_expr(PartitionKey key, int partkeyidx,
Expr *expr, Datum *value)
Fixed.
8. The code in get_partitions_from_ne_clauses() does perform quite a
few nested loops. I think a more simple way to would be to track the
offsets you've seen in a Bitmapset. This would save you having to
check for duplicates, as an offset can only contain a single datum.
You'd just need to build a couple of arrays after that, one to sum up
the offsets found per partition, and one for the total datums allowed
in the partition. If the numbers match then you can remove the
partition.I've written this and attached it to this email. It saves about 50
lines of code and should perform much better for complex cases, for
example, a large NOT IN list. This also implements #6.
I liked your patch, so incorporated it, except, I feel slightly
uncomfortable about the new name that you chose for the function because
it sounds a bit generic. I mean the function solves a very specific
problem and have very strict requirements for calling it. It's not like
we could pass it just any partitioned relation and/or just any set of
clauses. It has to be a list-partitioned table and the list of clauses
must contain only the clauses containing compatible <> operators. Checks
for those requirements are carried out in yet another place, that is,
classify_partition_bounding_keys().
Perhaps we can live with that though, because it's not a publicly
available function, but someone might get confused in the future.
9. "the same" -> "it"
/*
* In case of NOT IN (..), we get a '<>', which while not
* listed as part of any operator family, we are able to
* handle the same if its negator is indeed a part of the
* partitioning operator family.
*/
Done.
10. in classify_partition_bounding_keys: "0" -> "false"
/* Return if no work to do below. */
if (!will_compute_keys || *constfalse)
return 0;Likewise for:
if (*constfalse)
return 0;
Have fixed these per an earlier comment in this email.
11. I don't see partition_bound_bsearch used anywhere below the
following comment:* Generate bounding tuple(s).
*
* We look up partitions in the partition bound descriptor using, say,
* partition_bound_bsearch(), which expects a Datum (or Datums if multi-
* column key). So, extract the same out of the constant argument of
* each clause.I also don't know what the comment is trying to say.
The comment is no longer very intelligible to me too. I just wanted to
say here that, *elsewhere*, we will use a function like
partition_bound_bsearch() to look up partitions from the clauses we
matched against individual partition key columns. That function expects
the lookup key to be in a Datum array form, not a list-of-clauses form.
So, we must construct the lookup key(s) by extracting constant values out
the clauses.
I tried to rewrite it that way. Hope that's a bit clearer.
12.
* operator and sets *incl if equality is implied
should be:
* operator and set *incl to true if the operator's strategy is inclusive.
Done that way.
13. What does "the same" mean in:
* and add this one directly to the result. Caller would
* arbitrarily choose one of the many and perform
* partition-pruning with the same. It's possible that mutual
It means "the one that caller would arbitrarily choose of the many that
this function will return to it". Anyway, I changed "the same" to "it".
I think you quite often use "the same" to mean "it". Can you change that?
I guess that's just one of my many odd habits when writing English, all of
which I'm trying to get rid of, but apparently with limited success. Must
try harder. :)
14. Not sure what parameter you're talking about here.
* Evaluate 'leftarg op rightarg' and set *result to its value.
*
* leftarg and rightarg referred to above actually refer to the constant
* operand (Datum) of the clause contained in the parameters leftarg and
* rightarg below, respectively. And op refers to the operator of the
* clause contained in the parameter op below.
I rewrote the above comment block as:
* Try to compare the constant arguments of 'leftarg' and 'rightarg', in that
* order, using the operator of 'op' and set *result to the result of this
* comparison.
Is that any better?
15. "the latter" is normally used when you're referring to the last
thing in a list which was just mentioned. In this case, leftarg_const
and rightarg_const is the list, so "the latter" should mean
rightarg_const, but I think you mean to compare them using the
operator.* If the leftarg_const and rightarg_const are both of the type expected
* by op's operator, then compare them using the latter.
Rewrote it as:
* We can compare leftarg_const and rightarg_const using op's operator
* only if both are of the type expected by it.
16. There are a few things to improve with the following comment:
/*
* Hash partitioning stores partition keys containing nulls in regular
* partitions. That is, the code that determines the hash partition for
* a given row admits nulls in the partition key when computing the key's
* hash. So, here we treat any IS NULL clauses on partition key columns as
* equality keys, along with any other non-null values coming from equality
* operator clauses.
*/"admits" is not the correct word here, and "hash" should be "correct",
but there are more mistakes, so might be easier just to rewrite to:/*
* Since tuples with NULL values in the partition key columns are
stored in regular partitions,
* we'll treat any IS NULL clauses here as regular equality clauses.
/*
Agreed that your version is better, so went with it.
17. The following example will cause get_partitions_for_keys_hash to misbehave:
create table hashp (a int, b int) partition by hash (a, b);
create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
create table hashp3 partition of hashp for values with (modulus 4, remainder 3);
create table hashp4 partition of hashp for values with (modulus 4, remainder 2);
explain select * from hashp where a = 1 and a is null;The following code assumes that you'll never get a NULL test for a key
that has an equality test, and ends up trying to prune partitions
thinking we got compatible clauses for both partition keys.
Yeah, I think this example helps spot a problem. I thought we'd never get
to get_partitions_for_keys_hash() for the above query, because we
should've been able to prove much earlier that the particular clause
combination should be always false (a cannot be both non-null 1 and null).
Now, because the planner itself doesn't substitute a constant-false for
that, I taught classify_partition_bounding_keys() to do so. It would now
return constfalse=true if it turns out that clauses in the input list lead
to contradictory nullness condition for a given partition column.
memset(keyisnull, false, sizeof(keyisnull));
for (i = 0; i < partkey->partnatts; i++)
{
if (bms_is_member(i, keys->keyisnull))
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
}/*
* Can only do pruning if we know all the keys and they're all equality
* keys including the nulls that we just counted above.
*/
if (keys->n_eqkeys == partkey->partnatts)The above code will need to be made smarter. It'll likely crash if you
change "b" to a pass-by-ref type.
Hmm, not sure why. It seems to work:
create table hp (a int, b text) partition by hash (a, b);
create table hp1 partition of hp for values with (modulus 4, remainder 0);
create table hp2 partition of hp for values with (modulus 4, remainder 1);
create table hp3 partition of hp for values with (modulus 4, remainder 3);
create table hp4 partition of hp for values with (modulus 4, remainder 2);
insert into hp values (1, 'xxx');
INSERT 0 1
select tableoid::regclass, * from hp;
tableoid | a | b
----------+---+-----
hp1 | 1 | xxx
(1 row)
insert into hp (a) values (1);
INSERT 0 1
insert into hp (b) values ('xxx');
INSERT 0 1
select tableoid::regclass, * from hp where a is null;
tableoid | a | b
----------+---+-----
hp2 | | xxx
(1 row)
select tableoid::regclass, * from hp where b is null;
tableoid | a | b
----------+---+---
hp1 | 1 |
(1 row)
select tableoid::regclass, * from hp where a = 1 and b is null;
tableoid | a | b
----------+---+---
hp1 | 1 |
(1 row)
select tableoid::regclass, * from hp where a is null and b = 'xxx';
tableoid | a | b
----------+---+-----
hp2 | | xxx
(1 row)
18. The following code:
int other_idx = -1;
/*
* Only a designated partition accepts nulls, which if there
* exists one, return the same.
*/
if (partition_bound_accepts_nulls(boundinfo) ||
partition_bound_has_default(boundinfo))
other_idx = partition_bound_accepts_nulls(boundinfo)
? boundinfo->null_index
: boundinfo->default_index;
if (other_idx >= 0)
return bms_make_singleton(other_idx);
else
return NULL;should be simplified to:
/*
* NULLs may only exist in the NULL partition, or in the
* default, if there's no NULL partition.
*/
if (partition_bound_accepts_nulls(boundinfo))
return bms_make_singleton(boundinfo->null_index);
else if (partition_bound_has_default(boundinfo))
return bms_make_singleton(boundinfo->default_index);
return NULL;
Agreed, done that way.
19. "exists" -> "are"
* If there are no datums to compare keys with, but there exist
* partitions, it must be the default partition.also, instead of writing "it must be the default partition." it should
be better to say "just return the default partition."
OK, done.
20. I don't think the return NULL should ever hit, is it worth putting
a comment to say /* shouldn't happen */if (boundinfo->ndatums == 0)
{
if (partition_bound_has_default(boundinfo))
return bms_make_singleton(boundinfo->default_index);
else
return NULL;
}
I added a /* shouldn't happen */ comment next to return NULL.
21. Can the following comment does not explain the situation well:
/*
* boundinfo->ndatums - 1 is the last valid list partition datums
* index.
*/There's really no possible non-default partition for this case, so
perhaps we should just return the default, if one exists. We do go on
to check the n_maxkeys needlessly for this case. At the very least the
comment should be changed to:/*
* minkeys values are greater than any non-default partition.
* We'll check that for case below.
*/but I think it's worth just doing the default partition check there
and returning it, or NULL. It should help reduce confusion.
Yep, done.
Attached v20. Thanks again.
Regards,
Amit
Attachments:
v20-0001-Some-interface-changes-for-partition_bound_-cmp-.patchtext/plain; charset=UTF-8; name=v20-0001-Some-interface-changes-for-partition_bound_-cmp-.patchDownload
From 2390f5d0af6cc365dbbad424ad15afe1ca339803 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH v20 1/5] Some interface changes for
partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 166 +++++++++++++++++++++++++++++-----------
1 file changed, 123 insertions(+), 43 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8adc4ee977..1edbf66eae 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (caller should set is_bound to true and set bound), or a new tuple's
+ * partition key specified in datums (caller should set ndatums to the number
+ * of valid datums that are passed in the array).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -985,6 +1011,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -999,8 +1027,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1073,10 +1107,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1127,6 +1167,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1148,8 +1189,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1163,9 +1207,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2537,12 +2581,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2569,11 +2616,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2845,12 +2896,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2872,11 +2923,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2885,25 +2936,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If there's no tuple datum to compare with the bound,
+ * consider the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2914,12 +2995,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2933,20 +3015,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg may contain either a partition bound struct or a Datum array
+ * representing the partition key of a tuple being routed. We simply pass
+ * that down to partition_bound_cmp where it is interpreted appropriately.
*
- * *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *is_equal is set to whether the bound at the returned index is exactly
+ * equal to *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2959,8 +3040,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
v20-0002-Introduce-a-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v20-0002-Introduce-a-get_partitions_from_clauses.patchDownload
From 29a14b03741ddb2e11093459c7eb99af35fc8221 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v20 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of query clauses and
return a set of indexes of the partitions that satisfy all of those
clauses.
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Callers must have checked that each of the clauses matches one of the
partition keys.
---
src/backend/catalog/partition.c | 1987 ++++++++++++++++++++++++++++++++++
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/optimizer/clauses.h | 2 +
5 files changed, 1996 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1edbf66eae..974febbd12 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -163,6 +167,82 @@ typedef struct PartitionBoundCmpArg
int ndatums;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Is the following information initialized? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * This information is extracted from the query's mutually conjunctive operator
+ * clauses, each of whose variable argument is matched to a partition key and
+ * operator is checked to be contained in the corresponding column's partition
+ * operator family.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Values in the following array appear in no
+ * particular order (unlike minkeys and maxkeys below which must appear in
+ * the same order as the partition key columns). n_eqkeys must be equal to
+ * the number of partition keys to be valid (except in the case of hash
+ * partitioning where that's not required). When set, minkeys and maxkeys
+ * are ignored.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+ int n_eqkeys;
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. Values in
+ * the following arrays must appear in the same order as the partition key
+ * columns and may contain values for only a prefix of the partition key
+ * columns. If *_incl is true then the corresponding bound is inclusive
+ * and hence the partition into which the bound falls is to be included in
+ * the set of selected partitions.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ int n_minkeys;
+ bool min_incl;
+
+ Datum maxkeys[PARTITION_MAX_KEYS];
+ int n_maxkeys;
+ bool max_incl;
+
+ /*
+ * Information about nullness of partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied due to the assumption of strictness of the partitioning
+ * operators.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +291,35 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static Bitmapset *get_partitions_excluded_by(Relation relation,
+ List *ne_clauses);
+static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
+ int rt_index, List *or_clause_args);
+static bool classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses, List **ne_clauses);
+static void remove_redundant_clauses(PartitionKey partkey,
+ int partkeyidx, List *all_clauses,
+ List **result, bool *constfalse);
+static bool partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1581,9 +1690,1887 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_from_clauses
+ * Determine the set of partitions of 'relation' that will satisfy all
+ * the clauses contained in 'partclauses'
+ *
+ * Outputs:
+ * A Bitmapset containing indexes of all selected partitions.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ Bitmapset *result;
+ List *partconstr;
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+
+ Assert(partclauses != NIL);
+
+ /*
+ * If relation is a partition itself, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partition_bound_has_default(boundinfo) &&
+ (partconstr = RelationGetPartitionQual(relation)) != NIL)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+
+ /* Be careful not to modify the input list. */
+ partclauses = list_concat(list_copy(partclauses), partconstr);
+ }
+
+ result = get_partitions_from_clauses_recurse(relation, rt_index,
+ partclauses);
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_from_clauses_recurse
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ Bitmapset *result;
+ PartScanKeyInfo keys;
+ bool constfalse;
+ List *or_clauses,
+ *ne_clauses;
+ ListCell *lc;
+
+ /*
+ * Try to reduce the set of clauses into a form that
+ * get_partitions_for_keys() can work with.
+ */
+ if (classify_partition_bounding_keys(relation, clauses, rt_index,
+ &keys, &constfalse,
+ &or_clauses, &ne_clauses))
+ {
+ /*
+ * classify_partition_bounding_keys() may have found clauses marked
+ * pseudo-constant that are false that the planner didn't or it may
+ * have itself found contradictions among clauses.
+ */
+ if (constfalse)
+ return NULL;
+
+ result = get_partitions_for_keys(relation, &keys);
+ }
+ else
+ {
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we got
+ * an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+
+ /* Select partitions by applying the clauses containing <> operators. */
+ if (ne_clauses)
+ {
+ Bitmapset *ne_clause_parts;
+
+ ne_clause_parts = get_partitions_excluded_by(relation, ne_clauses);
+
+ /* Remove any matched partitions */
+ result = bms_del_members(result, ne_clause_parts);
+ bms_free(ne_clause_parts);
+ }
+
+ /* Select partitions by applying OR clauses. */
+ foreach(lc, or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_clause_args(relation, rt_index,
+ or->args);
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by
+ *
+ * Returns a Bitmapset of partition indexes of any partition that can safely
+ * be removed due to 'ne_clauses' containing not-equal clauses for all
+ * possible values that the partition can contain.
+ */
+static Bitmapset *
+get_partitions_excluded_by(Relation relation, List *ne_clauses)
+{
+ ListCell *lc;
+ Bitmapset *excluded_parts = NULL;
+ Bitmapset *foundoffsets = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int *datums_in_part;
+ int *datums_found;
+ int i;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partkey->partnatts == 1);
+
+ memset(&arg, 0, sizeof(arg));
+
+ /*
+ * Build a Bitmapset to record the indexes of all datums of the
+ * query that are found in boundinfo.
+ */
+ foreach(lc, ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(partkey, 0, pc->constarg, &datum))
+ {
+ int offset;
+ bool is_equal;
+
+ arg.datums = &datum;
+ arg.ndatums = 1;
+ offset = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums we found in the query, and another to record the number of
+ * datums permitted in each partition. Once we've counted all this, we
+ * can eliminate any partition where the number of datums found matches
+ * the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * partdesc->nparts);
+ datums_found = (int *) palloc0(sizeof(int) * partdesc->nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition, but the condition below that we must
+ * have found at least 1 datum will ensure that, because in the default
+ * partition's case, both arrays will contain zero.
+ */
+ for (i = 0; i < partdesc->nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
+ * get_partitions_from_or_clause_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_clause_args.
+ */
+static Bitmapset *
+get_partitions_from_or_clause_args(Relation relation, int rt_index,
+ List *or_clause_args)
+{
+ ListCell *lc;
+ Bitmapset *result = NULL;
+
+ foreach(lc, or_clause_args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation
+ * due to the latter's partition constraint, which means we must
+ * not add its partitions to or_partset. But the clause may not
+ * contain this relation's partition key expressions (instead the
+ * parent's), so we could not depend on just calling
+ * get_partitions_from_clauses_recurse(relation, ...) to determine
+ * that the clause indeed prunes all of the relation's partition.
+ *
+ * Use predicate refutation proof instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation, rt_index,
+ arg_clauses);
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ?\
+ (IsA((expr), Var) &&\
+ ((Var *) (expr))->varattno == (partattno)) :\
+ equal((expr), (partexpr)))
+
+/*
+ * classify_partition_bounding_keys
+ * Analyzes partition clauses to collect the equality key or minimum and
+ * maximum bounding keys using which to look up partitions of relation.
+ * Also collects information about the nullness of the individual
+ * partition key columns as the partitions may have certain properties
+ * with respect to null values. Keys and nullness information are stored
+ * in the output argument *keys.
+ *
+ * Clauses in the provided list are assumed to be implicitly ANDed, each of
+ * which is known to match some partition key column. They're mapped to the
+ * individual key columns and for each column, we find constant values that
+ * are compared to the column using operators that are compatible with
+ * partitioning. For example, if there is a clause a = 4 where a is a
+ * partition key column, then 4 is stored as the equality key if = is
+ * partitioning equality operator. If there are clauses a > 1 and a < 5, then
+ * 1 and 5 are stored as the minimum and maximum bounding keys, if > and < are
+ * partitioning less and greater operators, respectively. If there are
+ * multiple clauses addressing a given column, we first try to check if they
+ * are mutually contradictory and set *constfalse if so. For example, if there
+ * are clauses a = 1 and a = 2 in the list, then clearly both will never be
+ * true. Similarly for a > 1 and a < 0. For clauses containing ordering
+ * operators that are non-contradictory, we try to find the one that is the
+ * most restrictive and discard others. For example, of a > 1, a > 2, and
+ * a >= 5, the last one is the most restrictive and so 5 is the best minimum
+ * bound (which also happens to be inclusive), so it is kept while discarding
+ * both a > 1 and a > 2.
+ *
+ * For multi-column keys, an equality key needs to contain values corresponding
+ * to *all* partition key columns in the range patitioning case, whereas it's
+ * not necessary for hash partitioning. Actually, the latter requires that
+ * the remaining columns are covered by IS NULL clauses, but that's not checked
+ * in this function. Minimum and maximum bound keys are allowed to contain
+ * values for only a prefix partition key columns.
+ *
+ * Certain kinds of clauses are not immediately handled within this function
+ * and are instead returned to the caller for further processing. That
+ * includes OR clauses (both those encountered in the input list and those
+ * generated from ScalarArrayOpExpr clauses in the input list that have useOr
+ * set to true), which are returned to the caller in *or_clauses and clauses
+ * containing a <> operator (whose negator is a valid *list* partitioning
+ * equality operator), which are returned to the caller to in *ne_clauses.
+ *
+ * True is returned if *keys contains valid information upon return or if
+ * *constfalse is set to true.
+ */
+static bool
+classify_partition_bounding_keys(Relation relation, List *clauses,
+ int rt_index,
+ PartScanKeyInfo *keys, bool *constfalse,
+ List **or_clauses,
+ List **ne_clauses)
+{
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ int i;
+ ListCell *lc;
+ List *keyclauses_all[PARTITION_MAX_KEYS],
+ *keyclauses[PARTITION_MAX_KEYS];
+ bool will_compute_keys = false;
+ Bitmapset *keyisnull = NULL,
+ *keyisnotnull = NULL;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int n_keynullness = 0;
+
+ *or_clauses = NIL;
+ *ne_clauses = NIL;
+ *constfalse = false;
+ memset(keyclauses_all, 0, sizeof(keyclauses_all));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause;
+ ListCell *partexprs_item;
+
+ if (IsA(lfirst(lc), RestrictInfo))
+ {
+ RestrictInfo *rinfo = lfirst(lc);
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return true;
+ }
+ }
+ else
+ clause = (Expr *) lfirst(lc);
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ *or_clauses = lappend(*or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+
+ /*
+ * A non-zero partattno refers to a simple column reference that
+ * will be matched against varattno of a Var appearing the clause.
+ * partattno == 0 refers to arbitrary expressions, which get the
+ * current one from PartitionKey.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ /* Copy to avoid overwriting the relcache's content. */
+ partexpr = copyObject(lfirst(partexprs_item));
+
+ /*
+ * Expressions stored in PartitionKey in the relcache all
+ * contain a dummy varno (that is, 1), but we must switch to
+ * the RT index of the table in this query so that it can be
+ * correctly matched to the expressions coming from the query.
+ */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ constexpr = leftop;
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering opertors like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ int strategy;
+ Oid negator,
+ lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber &&
+ partkey->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * Flip the left and right args if we have to, because the
+ * code which extract the constant value to use for
+ * partition-pruning expects to find it as the rightop of the
+ * clause. (See below in this function.)
+ */
+ if (constexpr == rightop)
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+ Oid commutator = get_commutator(opclause->opno);
+
+ /*
+ * Caller must have made sure to check that the commutator
+ * indeed exists.
+ */
+ Assert(OidIsValid(commutator));
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = commutator;
+ commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by().
+ */
+ if (is_ne_listp)
+ *ne_clauses = lappend(*ne_clauses, pc);
+ else
+ {
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+
+ /*
+ * Since we only allow strict operators, require keys to
+ * be not null.
+ */
+ if (bms_is_member(i, keyisnull))
+ {
+ *constfalse = true;
+ return true;
+ }
+ keyisnotnull = bms_add_member(keyisnotnull, i);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = linitial(saop->args),
+ *rightop = lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ /* Clause does not match this partition key. */
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle it if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *elem_clause;
+
+ if (rightop->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ *or_clauses = lappend(*or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (EXPR_MATCHES_PARTKEY(arg, partattno, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ if (bms_is_member(i, keyisnotnull))
+ {
+ *constfalse = true;
+ return true;
+ }
+ keyisnull = bms_add_member(keyisnull, i);
+ }
+ else
+ keyisnotnull = bms_add_member(keyisnotnull, i);
+ n_keynullness++;
+ will_compute_keys = true;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ keyclauses_all[i] = lappend(keyclauses_all[i], pc);
+ will_compute_keys = true;
+ }
+ }
+ }
+
+ /* Return if no work to do below. */
+ if (!will_compute_keys)
+ return false;
+
+ /*
+ * Try to eliminate redundant keys. In the process, we might find out
+ * that clauses are mutually contradictory and hence can never be true
+ * for any rows.
+ */
+ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ remove_redundant_clauses(partkey, i,
+ keyclauses_all[i], &keyclauses[i],
+ constfalse);
+ if (*constfalse)
+ return true;
+ }
+
+ /*
+ * Generate bounding tuple(s).
+ *
+ * Eventually, callers will use a function like partition_bound_bsearch()
+ * to look up partitions from the clauses we matched against individual
+ * partition key columns. Those function expect the lookup key to be in a
+ * Datum array form, not a list-of-clauses form. So, we must construct the
+ * lookup key(s) by extracting constant values out the clauses.
+ *
+ * Based on the strategies of the clause operators (=, </<=, >/>=), try to
+ * construct tuples of those datums that serve as the exact look up tuple
+ * or tuples that serve as minimum and maximum bound. If we find datums
+ * for all partition key columns that appear in = operator clauses, then
+ * we have the exact match look up tuple, which will be used to match just
+ * one partition. If the last datum in a tuple comes from a clause
+ * containing </<= or >/>= operator, then that constitutes the minimum
+ * or maximum bound tuple, respectively. There is one exception -- if
+ * we have a tuple containing values for only a prefix of partition key
+ * columns, where none of its values come from a </<= or >/>= operator
+ * clause, we still consider such tuple as both the minimum and maximum
+ * bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, keyclauses[i])
+ {
+ PartClause *clause = lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing equality
+ * operators for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = keyisnull;
+ keys->keyisnotnull = keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || n_keynullness > 0);
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'op' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ PartOpStrategy result;
+
+ *incl = false; /* overwritten as appropriate below */
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = PART_OP_EQUAL;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ result = PART_OP_LESS;
+ break;
+ case BTEqualStrategyNumber:
+ *incl = true;
+ result = PART_OP_EQUAL;
+ break;
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ result = PART_OP_GREATER;
+ break;
+ }
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partkeyidx])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partkeyidx], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * For a given partition key column, find the most restrictive of the clauses
+ * contained in all_clauses that are known to match the column and add it to
+ * *result.
+ *
+ * If it is found that two clauses are mutually contradictory, *constfalse
+ * is set to true before returning.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey, int partkeyidx,
+ List *all_clauses, List **result,
+ bool *constfalse)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ bool test_result;
+
+ *result = NIL;
+
+ hash_clause = NULL;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[partkeyidx],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've matched
+ * a clause and found another clause whose constant operand doesn't
+ * match the constant operand of the former, then we have found
+ * mutually contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, partkeyidx,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ *result = lappend(*result, cur);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, partkeyidx,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ *constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * we couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ *result = lappend(*result, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ *result = lappend(*result, hash_clause);
+ return;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, partkeyidx,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ *constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partkeyidx,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, partkeyidx,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the result.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ if (btree_clauses[s])
+ *result = lappend(*result, btree_clauses[s]);
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ * We may not be able to perform the comparison if operand values are
+ * unavailable and/or types of operands are incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partkeyidx];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * We can compare leftarg_const and rightarg_const using op's operator
+ * only if both are of the type expected by it.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions of 'rel' that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selceted partitions
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ return NULL; /* keep compiler quiet */
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+ Assert(partkey->partnatts == 1);
+
+ /*
+ * If the query is looking for null keys, there can only be one such
+ * partition. Return the same if one exists.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (minoff >= 0)
+ {
+ /*
+ * The bound at minoff is <= minkeys, given the way
+ * partition_bound_bsearch() works. If it's not equal (<), then
+ * increment minoff to make it point to the datum on the right
+ * that necessarily satisfies minkeys. Also do the same if it is
+ * equal but minkeys is exclusive.
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * minkeys is greater than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (maxoff >= 0)
+ {
+ /*
+ * The bound at maxoff is <= maxkeys, given the way
+ * partition_bound_bsearch works. If the bound at maxoff exactly
+ * matches maxkey (is_equal), but the maxkey is exclusive, then
+ * decrement maxoff to point to the bound on the left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal,
+ include_def = false;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partkey->partnatts);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering it as the lower bound
+ * of the partition that eqkeys falls into, the bound at eqoff + 1
+ * would be its upper bound, so use eqoff + 1 to get the desired
+ * partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_bound_bsearch would've returned the offset of just one of
+ * those. If minkey is inclusive, we must decrement minoff until it
+ * reaches the leftmost of those bound values, so that partitions
+ * corresponding to all those bound values are selected. If minkeys
+ * is exclusive, we must increment minoff until it reaches the first
+ * bound greater than this prefix, so that none of the partitions
+ * corresponding to those bound values are selected.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ keys->min_incl
+ ? minoff - 1 : minoff + 1,
+ &arg);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff += 1;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ keys->max_incl
+ ? maxoff + 1 : maxoff - 1,
+ &arg);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, we believe that minoff/maxoff point to the upper bound
+ * of some partition, but it may not be the case. It might actually be
+ * the upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range us unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..8423c6e886 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -73,4 +73,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
v20-0003-Move-some-code-of-set_append_rel_size-to-separat.patchtext/plain; charset=UTF-8; name=v20-0003-Move-some-code-of-set_append_rel_size-to-separat.patchDownload
From cbdae2e89deeb73cc3fa9fa8f3ddc3b2d59622e6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH v20 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c5304b712e..fee078a9c7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ac5a7c9553..35345ccbe9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 725694f570..9b4288ad92 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -300,5 +300,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
v20-0004-More-refactoring-around-partitioned-table-Append.patchtext/plain; charset=UTF-8; name=v20-0004-More-refactoring-around-partitioned-table-Append.patchDownload
From 4d70df628b481dc8046efe0dacfad06a32a98fe6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v20 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 120 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 19 ++++--
src/backend/optimizer/util/relnode.c | 10 +++
src/include/nodes/relation.h | 22 ++++++-
4 files changed, 115 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fee078a9c7..8f761a77e8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1207,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1299,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1349,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ list_copy(childrel->live_partitioned_rels));
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7b52dadd81..b0f6051618 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6189,14 +6189,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 35345ccbe9..4b5d50eb2c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +743,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 71689b8ed6..63623f2687 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,6 +529,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +658,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
v20-0005-Teach-planner-to-use-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v20-0005-Teach-planner-to-use-get_partitions_from_clauses.patchDownload
From 1fa01136470a6572835c417df3f4feb7167df871 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH v20 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using the information in its
PartitionScheme. Further, if we pass the set of matched clauses to
get_partitions_from_clauses(), we get the set of matching partitions
in (hopefully) less time than determining the same by running the
matching algorithm separately for each partition.
Authors: Amit Langote, Dilip Kumar
---
src/backend/optimizer/path/allpaths.c | 399 ++++++++++++++++++++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 41 ++-
src/include/nodes/relation.h | 7 +-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition_prune.out | 442 ++++++++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 74 ++++-
7 files changed, 917 insertions(+), 78 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8f761a77e8..e7c7a6e7a0 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,8 +20,10 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
@@ -136,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -847,6 +857,390 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Return the list of partitions of rel that pass the clauses mentioned
+ * in rel->baserestrictinfo. An empty list is returned if no matching
+ * partitions were found.
+ *
+ * Returned list contains the AppendRelInfos of chosen partitions.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *partclauses;
+ bool contains_const,
+ constfalse;
+ List *result = NIL;
+ int i;
+ Relation parent;
+ PartitionDesc partdesc;
+ Bitmapset *partindexes;
+
+ /*
+ * Get the clauses that match the partition key. It's also a good idea
+ * to check if the matched clauses contain constant values that can be
+ * used for pruning and go to get_partitions_from_clauses() only if so.
+ * If rel->baserestrictinfo might contain mutually contradictory clauses,
+ * also find out about that.
+ */
+ partclauses = match_clauses_to_partkey(root, rel, rel->baserestrictinfo,
+ &contains_const, &constfalse);
+
+ /* We're done here. */
+ if (constfalse)
+ return NIL;
+
+ parent = heap_open(rte->relid, NoLock);
+ partdesc = RelationGetPartitionDesc(parent);
+
+ if (partclauses != NIL && contains_const)
+ partindexes = get_partitions_from_clauses(parent, rel->relid,
+ partclauses);
+ else
+ {
+ /*
+ * There are no clauses that are useful to prune any partitions, so
+ * scan all partitions.
+ */
+ partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == rte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clauses_to_partkey
+ * Match clauses with rel's partition key
+ *
+ * Returned list contains clauses matched to the partition key columns and
+ * *contains_const and *constfalse are set as described below.
+ *
+ * For an individual clause to match with a partition key column, the clause
+ * must be an operator clause of the form (partkey op const) or (const op
+ * partkey); the latter only if a suitable commutator exists. Furthermore,
+ * the operator must be strict and its input collation must match the partition
+ * collation. The aforementioned "const" means any expression that doesn't
+ * involve a volatile function or a Var of this relation. We allow Vars
+ * belonging to other relations (for example, if the clause is a join clause),
+ * but they are treated as parameters whose values are not known now, so cannot
+ * be used for partition pruning right within the planner. It's the
+ * responsibility of higher code levels to manage restriction and join clauses
+ * appropriately. If a NullTest against a partition key is encountered, it's
+ * added to the result as well.
+ *
+ * *contains_const is set if at least one matched clauses contains the constant
+ * operand or is a Nullness test. *constfalse is set if the input list
+ * contains a pseudo-constant RestrictInfo with false value.
+ */
+static List *
+match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse)
+{
+ PartitionScheme partscheme = rel->part_scheme;
+ List *result = NIL;
+ ListCell *lc;
+
+ *contains_const = false;
+ *constfalse = false;
+
+ Assert (partscheme != NULL);
+
+ /* Make a copy, because we may scribble on it below. */
+ clauses = list_copy(clauses);
+
+ foreach(lc, clauses)
+ {
+ Node *member = lfirst(lc);
+ Expr *clause;
+ int i;
+
+ if (IsA(member, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) member;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ (IsA(clause, Const) &&
+ ((((Const *) clause)->constisnull) ||
+ !DatumGetBool(((Const *) clause)->constvalue))))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+ else
+ clause = (Expr *) member;
+
+ /*
+ * For a BoolExpr, we should try to match each of its args with the
+ * partition key as described below for each type.
+ */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ /*
+ * For each of OR clause's args, call this function
+ * recursively with a given arg as the only member in the
+ * input list and see if it's returned as matching the
+ * partition key. Add the OR clause to the result iff at
+ * least one of its args contain a matching clause.
+ */
+ BoolExpr *orclause = (BoolExpr *) clause;
+ ListCell *lc1;
+ bool arg_matches_key = false,
+ matched_arg_contains_const = false,
+ all_args_constfalse = true;
+
+ foreach (lc1, orclause->args)
+ {
+ Node *arg = lfirst(lc1);
+ bool contains_const1,
+ constfalse1;
+
+ if (match_clauses_to_partkey(root, rel, list_make1(arg),
+ &contains_const1,
+ &constfalse1) != NIL)
+ {
+ arg_matches_key = true;
+ matched_arg_contains_const = contains_const1;
+ }
+
+ /* We got at least one arg that is not constant false. */
+ if (!constfalse1)
+ all_args_constfalse = false;
+ }
+
+ if (arg_matches_key)
+ {
+ result = lappend(result, clause);
+ *contains_const = matched_arg_contains_const;
+ }
+
+ /* OR clause is "constant false" if all of its args are. */
+ *constfalse = all_args_constfalse;
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Since the clause is itself implicitly ANDed with other
+ * clauses in the input list, queue the args to be processed
+ * later as if they were part of the original input list.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < partscheme->partnatts; i++)
+ {
+ Node *partkey = linitial(rel->partexprs[i]);
+ Oid partopfamily = partscheme->partopfamily[i],
+ partcoll = partscheme->partcollation[i];
+
+ /*
+ * Check if the clauses matches the partition key and add it to
+ * the result list if other things such as operator input
+ * collation, strictness, etc. look fine.
+ */
+ if (is_opclause(clause))
+ {
+ Expr *constexpr,
+ *leftop,
+ *rightop;
+ Relids constrelids;
+ Oid expr_op,
+ expr_coll;
+
+ leftop = (Expr *) get_leftop(clause);
+ rightop = (Expr *) get_rightop(clause);
+ expr_op = ((OpExpr *) clause)->opno;
+ expr_coll = ((OpExpr *) clause)->inputcollid;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ if (equal(leftop, partkey))
+ {
+ constexpr = rightop;
+ constrelids = pull_varnos((Node *) rightop);
+ }
+ else if (equal(rightop, partkey))
+ {
+ constexpr = leftop;
+ constrelids = pull_varnos((Node *) leftop);
+ expr_op = get_commutator(expr_op);
+
+ /*
+ * If no commutator exists, cannot flip the qual's args,
+ * so give up.
+ */
+ if (!OidIsValid(expr_op))
+ continue;
+ }
+ else
+ /* Neither argument matches the partition key. */
+ continue;
+
+ /*
+ * Useless if what we're thinking of as a constant is actually
+ * a Var coming from this relation.
+ */
+ if (bms_is_member(rel->relid, constrelids))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, expr_coll))
+ continue;
+
+ /*
+ * Only allow strict operators to think sanely about the
+ * behavior with null arguments.
+ */
+ if (!op_strict(expr_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Everything seems to be fine, so add it to the list of
+ * clauses we will use for pruning.
+ */
+ result = lappend(result, clause);
+
+ if (!*contains_const)
+ *contains_const = IsA(constexpr, Const);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Node *leftop = (Node *) linitial(saop->args),
+ *rightop = (Node *) lsecond(saop->args);
+
+ if (IsA(leftop, RelabelType))
+ leftop = (Node *) ((RelabelType *) leftop)->arg;
+ if (!equal(leftop, partkey))
+ continue;
+
+ /* Check if saop_op is compatible with partitioning. */
+ if (!op_strict(saop_op))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop_coll))
+ continue;
+
+ /* OK to add to the result. */
+ result = lappend(result, clause);
+ if (IsA(eval_const_expressions(root, rightop), Const))
+ *contains_const = true;
+ else
+ *contains_const = false;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Node *arg = (Node *) nulltest->arg;
+
+ if (equal(arg, partkey))
+ {
+ result = lappend(result, nulltest);
+ /* A Nullness test can be used right away. */
+ *contains_const = true;
+ }
+ }
+ /*
+ * Certain Boolean conditions have a special shape, which we
+ * accept if the partitioning opfamily accepts Boolean conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) || not_clause((Node *) clause)))
+ {
+ /*
+ * Only accept those for pruning that appear to be
+ * IS [NOT] TRUE/FALSE.
+ */
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+ Expr *arg = btest->arg;
+
+ if (btest->booltesttype != IS_UNKNOWN &&
+ btest->booltesttype != IS_NOT_UNKNOWN &&
+ equal((Node *) arg, partkey))
+ result = lappend(result, clause);
+ }
+ else if (IsA(clause, Var))
+ {
+ if (equal((Node *) clause, partkey))
+ result = lappend(result, clause);
+ }
+ else
+ {
+ Node *arg = (Node *) get_notclausearg((Expr *) clause);
+
+ if (equal(arg, partkey))
+ result = lappend(result, clause);
+ }
+
+ *contains_const = true;
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +1282,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 1d152c514e..d9249f4c33 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1397,6 +1397,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8c60b35068..c103deb21b 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1247,22 +1246,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1920,6 +1929,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 63623f2687..855d51ea09 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..2072766efd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..83e60814f7 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1067,363 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..13b12078bf 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,76 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
On 16 January 2018 at 21:08, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/01/12 12:30, David Rowley wrote:
8. The code in get_partitions_from_ne_clauses() does perform quite a
few nested loops. I think a more simple way to would be to track the
offsets you've seen in a Bitmapset. This would save you having to
check for duplicates, as an offset can only contain a single datum.
You'd just need to build a couple of arrays after that, one to sum up
the offsets found per partition, and one for the total datums allowed
in the partition. If the numbers match then you can remove the
partition.I've written this and attached it to this email. It saves about 50
lines of code and should perform much better for complex cases, for
example, a large NOT IN list. This also implements #6.I liked your patch, so incorporated it, except, I feel slightly
uncomfortable about the new name that you chose for the function because
it sounds a bit generic.
You're right. I only renamed it because I inverted the meaning of the
function in the patch. It no longer did
"get_partitions_from_ne_clauses", it did the opposite and give the
partitions which can't match. Please feel free to think of a new
better name. Is "get_partitions_excluded_by_ne_clauses" too long?
I think you quite often use "the same" to mean "it". Can you change that?
I guess that's just one of my many odd habits when writing English, all of
which I'm trying to get rid of, but apparently with limited success. Must
try harder. :)
Oops, on re-reading that it sounded as though I was asking you to
change some habit, but I just meant the comments. I understand there
will be places that use English where that's normal. It's just I don't
recall seeing that in PostgreSQL code before. American English is
pretty much the standard for the project, despite that not always
being strictly applied (e.g we have a command called ANALYSE which is
an alias for ANALYZE). I always try and do my best to spell words in
American English (which is not where I'm from), which for me stretches
about as far as putting 'z' in the place of some of my 's'es.
I rewrote the above comment block as:
* Try to compare the constant arguments of 'leftarg' and 'rightarg', in that
* order, using the operator of 'op' and set *result to the result of this
* comparison.Is that any better?
Sounds good.
15. "the latter" is normally used when you're referring to the last
thing in a list which was just mentioned. In this case, leftarg_const
and rightarg_const is the list, so "the latter" should mean
rightarg_const, but I think you mean to compare them using the
operator.* If the leftarg_const and rightarg_const are both of the type expected
* by op's operator, then compare them using the latter.Rewrote it as:
* We can compare leftarg_const and rightarg_const using op's operator
* only if both are of the type expected by it.
I'd probably write "expected type." instead of "type expected by it."
17. The following example will cause get_partitions_for_keys_hash to misbehave:
create table hashp (a int, b int) partition by hash (a, b);
create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
create table hashp3 partition of hashp for values with (modulus 4, remainder 3);
create table hashp4 partition of hashp for values with (modulus 4, remainder 2);
explain select * from hashp where a = 1 and a is null;The following code assumes that you'll never get a NULL test for a key
that has an equality test, and ends up trying to prune partitions
thinking we got compatible clauses for both partition keys.Yeah, I think this example helps spot a problem. I thought we'd never get
to get_partitions_for_keys_hash() for the above query, because we
should've been able to prove much earlier that the particular clause
combination should be always false (a cannot be both non-null 1 and null).
Now, because the planner itself doesn't substitute a constant-false for
that, I taught classify_partition_bounding_keys() to do so. It would now
return constfalse=true if it turns out that clauses in the input list lead
to contradictory nullness condition for a given partition column.memset(keyisnull, false, sizeof(keyisnull));
for (i = 0; i < partkey->partnatts; i++)
{
if (bms_is_member(i, keys->keyisnull))
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
}/*
* Can only do pruning if we know all the keys and they're all equality
* keys including the nulls that we just counted above.
*/
if (keys->n_eqkeys == partkey->partnatts)The above code will need to be made smarter. It'll likely crash if you
change "b" to a pass-by-ref type.Hmm, not sure why. It seems to work:
Yeah, works now because you've added new code to test for
contradictions in the quals, e.g a = 1 and a is null is now rejected
as constfalse.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 16 January 2018 at 21:08, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached v20. Thanks again.
Thanks for working on v20. I've had a look over part of it and I've
written down the following:
1. The following comment is not correct
/*
* Equality look up key. Values in the following array appear in no
* particular order (unlike minkeys and maxkeys below which must appear in
* the same order as the partition key columns).
These must be in partition key order, just like the others.
This part is not true either:
* the same order as the partition key columns). n_eqkeys must be equal to
* the number of partition keys to be valid (except in the case of hash
* partitioning where that's not required). When set, minkeys and maxkeys
* are ignored.
range2 is pruned just fine from the following:
create table rangep (a int, b int) partition by range (a,b);
create table rangep1 partition of rangep for values from (1,10) to (1,20);
create table rangep2 partition of rangep for values from (2,10) to (2,20);
explain select * from rangep where a = 1;
QUERY PLAN
---------------------------------------------------------------
Append (cost=0.00..38.25 rows=11 width=8)
-> Seq Scan on rangep1 (cost=0.00..38.25 rows=11 width=8)
Filter: (a = 1)
(3 rows)
2. You've added a list_copy() to get_partitions_from_clauses so as not
to modify the input list, but this function calls
get_partitions_from_clauses_recurse which calls
classify_partition_bounding_keys() which modifes that list. Would it
not just be better to make a list copy in
get_partitions_from_clauses() without any conditions?
If we get new users of that function, e.g Run-time pruning, then they
might be surprised to see new items magically added to their input
list without mention of that behaviour in the function comment.
3. The following case causes an Assert failure:
drop table listp;
CREATE TABLE listp (a int, b int) partition by list (a);
create table listp1 partition of listp for values in (1);
create table listp2 partition of listp for values in (2);
prepare q1 (int) as select * from listp where a = 1 and a in($1,10);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1);
explain execute q1 (1); -- <--- Assert failure!
In match_clauses_to_partkey you always add the ScalarArrayOpExpr to
the result regardless if it is a complete set of Consts, however, the
code in classify_partition_bounding_keys() that deals with
ScalarArrayOpExpr in can't handle non-consts
/* OK to add to the result. */
result = lappend(result, clause);
if (IsA(eval_const_expressions(root, rightop), Const))
*contains_const = true;
else
*contains_const = false;
*contains_consts is reset to true again by the a = 1 qual, so
get_partitions_from_clauses() gets called from
get_append_rel_partitions. Later classify_partition_bounding_keys()
when processing the ScalarArrayOpExpr, the following code assumes the
array exprs are all Consts:
foreach(lc1, elem_exprs)
{
Const *rightop = castNode(Const, lfirst(lc1));
Setting *contains_const = false; in match_clauses_to_partkey() is not
correct either. If I understand the intent here correctly, you want
this to be set to true if the clause list contains quals with any
consts that are useful for partition pruning during planning. If
that's the case then you should set it to true if you find a suitable
clause, otherwise leave it set to false as you set it to at the start
of the function. What you have now will have varying results based on
the order of the clauses in the list, which is certainly not correct.
4. The following code can be rearranged to not pull_varnos if there's
no commutator op.
constexpr = leftop;
constrelids = pull_varnos((Node *) leftop);
expr_op = get_commutator(expr_op);
/*
* If no commutator exists, cannot flip the qual's args,
* so give up.
*/
if (!OidIsValid(expr_op))
continue;
5. Header comment for match_clauses_to_partkey() says only clauses in
the pattern of "partkey op const" and "const op partkey" are handled.
NULL tests are also mentioned but nothing is mentioned about
ScalarArrayOpExpr. It might be better to be less verbose about what
the function handles, but if you're listing what is handled then you
should not make false claims.
* For an individual clause to match with a partition key column, the clause
* must be an operator clause of the form (partkey op const) or (const op
* partkey); the latter only if a suitable commutator exists. Furthermore,
6. Which brings me to; why do we need match_clauses_to_partkey at all?
classify_partition_bounding_keys seems to do all the work
match_clauses_to_partkey does, plus more. Item #3 above is caused by
an inconsistency between these functions. What benefit does
match_clauses_to_partkey give? I might understand if you were creating
list of clauses matching each partition key, but you're just dumping
everything in one big list which causes
classify_partition_bounding_keys() to have to match each clause to a
partition key again, and classify_partition_bounding_keys is even
coded to ignore clauses that don't' match any key, so it makes me
wonder what is match_clauses_to_partkey actually for?
I'm going to stop reviewing there as if you remove
match_clauses_to_partkey is going to cause churn that'll need to be
reviewed again.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
On Wed, Jan 17, 2018 at 12:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 16 January 2018 at 21:08, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/01/12 12:30, David Rowley wrote:
8. The code in get_partitions_from_ne_clauses() does perform quite a
few nested loops. I think a more simple way to would be to track the
offsets you've seen in a Bitmapset. This would save you having to
check for duplicates, as an offset can only contain a single datum.
You'd just need to build a couple of arrays after that, one to sum up
the offsets found per partition, and one for the total datums allowed
in the partition. If the numbers match then you can remove the
partition.I've written this and attached it to this email. It saves about 50
lines of code and should perform much better for complex cases, for
example, a large NOT IN list. This also implements #6.I liked your patch, so incorporated it, except, I feel slightly
uncomfortable about the new name that you chose for the function because
it sounds a bit generic.You're right. I only renamed it because I inverted the meaning of the
function in the patch. It no longer did
"get_partitions_from_ne_clauses", it did the opposite and give the
partitions which can't match. Please feel free to think of a new
better name. Is "get_partitions_excluded_by_ne_clauses" too long?I think you quite often use "the same" to mean "it". Can you change that?
I guess that's just one of my many odd habits when writing English, all of
which I'm trying to get rid of, but apparently with limited success. Must
try harder. :)Oops, on re-reading that it sounded as though I was asking you to
change some habit, but I just meant the comments. I understand there
will be places that use English where that's normal. It's just I don't
recall seeing that in PostgreSQL code before.
No worries, I too slightly misread what you'd said.
When I double checked, I too couldn't find "the same" used the way as
I did in the patch. So I actually ended up finding and replacing more
"the same"s with "it" than you had pointed out in your review in the
latest v20 patch.
American English is
pretty much the standard for the project, despite that not always
being strictly applied (e.g we have a command called ANALYSE which is
an alias for ANALYZE). I always try and do my best to spell words in
American English (which is not where I'm from), which for me stretches
about as far as putting 'z' in the place of some of my 's'es.
I see.
15. "the latter" is normally used when you're referring to the last
thing in a list which was just mentioned. In this case, leftarg_const
and rightarg_const is the list, so "the latter" should mean
rightarg_const, but I think you mean to compare them using the
operator.* If the leftarg_const and rightarg_const are both of the type expected
* by op's operator, then compare them using the latter.Rewrote it as:
* We can compare leftarg_const and rightarg_const using op's operator
* only if both are of the type expected by it.I'd probably write "expected type." instead of "type expected by it."
OK, will do.
17. The following example will cause get_partitions_for_keys_hash to misbehave:
create table hashp (a int, b int) partition by hash (a, b);
create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
create table hashp3 partition of hashp for values with (modulus 4, remainder 3);
create table hashp4 partition of hashp for values with (modulus 4, remainder 2);
explain select * from hashp where a = 1 and a is null;
[ ... ]
The above code will need to be made smarter. It'll likely crash if you
change "b" to a pass-by-ref type.Hmm, not sure why. It seems to work:
Yeah, works now because you've added new code to test for
contradictions in the quals, e.g a = 1 and a is null is now rejected
as constfalse.
Oh, I see. I thought you were talking of it as an independent issue.
Thanks,
Amit
On 17 January 2018 at 17:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
6. Which brings me to; why do we need match_clauses_to_partkey at all?
classify_partition_bounding_keys seems to do all the work
match_clauses_to_partkey does, plus more. Item #3 above is caused by
an inconsistency between these functions. What benefit does
match_clauses_to_partkey give? I might understand if you were creating
list of clauses matching each partition key, but you're just dumping
everything in one big list which causes
classify_partition_bounding_keys() to have to match each clause to a
partition key again, and classify_partition_bounding_keys is even
coded to ignore clauses that don't' match any key, so it makes me
wonder what is match_clauses_to_partkey actually for?
I started to look at this and ended up shuffling the patch around a
bit to completely remove the match_clauses_to_partkey function.
I also cleaned up some of the comments and shuffled some fields around
in some of the structs to shrink them down a bit.
All up, this has saved 268 lines of code in the patch.
src/backend/catalog/partition.c | 296 ++++++++++++++++-----------
src/backend/optimizer/path/allpaths.c | 368 ++--------------------------------
2 files changed, 198 insertions(+), 466 deletions(-)
It's had very minimal testing. Really I've only tested that the
regression tests pass.
I also fixed up the bad assumption that IN lists will contain Consts
only which hopefully fixes the crash I reported earlier.
I saw you'd added a check to look for contradicting IS NOT NULL
clauses when processing an IS NULL clause, but didn't do anything for
the opposite case. I added code for this so it behaves the same
regardless of the clause order.
Can you look at my changes and see if I've completely broken anything?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
faster_partition_prune_v20_delta_drowley.patchapplication/octet-stream; name=faster_partition_prune_v20_delta_drowley.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 974febb..acd29eb 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -155,7 +155,6 @@ typedef struct PartitionRangeBound
*/
typedef struct PartitionBoundCmpArg
{
- bool is_bound;
union
{
PartitionListValue *lbound;
@@ -165,6 +164,7 @@ typedef struct PartitionBoundCmpArg
Datum *datums;
int ndatums;
+ bool is_bound;
} PartitionBoundCmpArg;
/*
@@ -177,7 +177,7 @@ typedef struct PartClause
Expr *constarg;
/* cached info. */
- bool valid_cache; /* Is the following information initialized? */
+ bool valid_cache; /* Are the following fields populated? */
int op_strategy;
Oid op_subtype;
FmgrInfo op_func;
@@ -199,45 +199,43 @@ typedef enum PartOpStrategy
* Information about partition look up keys to be passed to
* get_partitions_for_keys()
*
- * This information is extracted from the query's mutually conjunctive operator
- * clauses, each of whose variable argument is matched to a partition key and
- * operator is checked to be contained in the corresponding column's partition
- * operator family.
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Properties found are cache and are indexed by the partition
+ * key index.
*/
typedef struct PartScanKeyInfo
{
/*
- * Equality look up key. Values in the following array appear in no
- * particular order (unlike minkeys and maxkeys below which must appear in
- * the same order as the partition key columns). n_eqkeys must be equal to
- * the number of partition keys to be valid (except in the case of hash
- * partitioning where that's not required). When set, minkeys and maxkeys
- * are ignored.
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
*/
Datum eqkeys[PARTITION_MAX_KEYS];
- int n_eqkeys;
/*
- * Lower and upper bounds on a sequence of selected partitions. Values in
- * the following arrays must appear in the same order as the partition key
- * columns and may contain values for only a prefix of the partition key
- * columns. If *_incl is true then the corresponding bound is inclusive
- * and hence the partition into which the bound falls is to be included in
- * the set of selected partitions.
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
*/
Datum minkeys[PARTITION_MAX_KEYS];
- int n_minkeys;
- bool min_incl;
-
Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses found for the corresponding partition
+ * are inclusive of the stored value or not.
+ */
+ bool min_incl;
bool max_incl;
/*
- * Information about nullness of partition keys, either specified
+ * Information about nullness of the partition keys, either specified
* explicitly in the query (in the form of a IS [NOT] NULL clause) or
- * implied due to the assumption of strictness of the partitioning
- * operators.
+ * implied from strict clauses matching the partition key.
*/
Bitmapset *keyisnull;
Bitmapset *keyisnotnull;
@@ -293,8 +291,8 @@ PG_FUNCTION_INFO_V1(satisfies_hash_partition);
static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
int rt_index, List *clauses);
-static Bitmapset *get_partitions_excluded_by(Relation relation,
- List *ne_clauses);
+static Bitmapset *get_partitions_excluded_by_ne_clauses(Relation relation,
+ List *ne_clauses);
static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
int rt_index, List *or_clause_args);
static bool classify_partition_bounding_keys(Relation relation, List *clauses,
@@ -1692,22 +1690,26 @@ get_partition_qual_relid(Oid relid)
/*
* get_partitions_from_clauses
- * Determine the set of partitions of 'relation' that will satisfy all
- * the clauses contained in 'partclauses'
+ * Determine all partitions of 'relation' that could possibly contain a
+ * record that matches 'partclauses'
*
- * Outputs:
- * A Bitmapset containing indexes of all selected partitions.
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
*/
Bitmapset *
get_partitions_from_clauses(Relation relation, int rt_index,
List *partclauses)
{
- Bitmapset *result;
- List *partconstr;
+ List *clauses;
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
PartitionBoundInfo boundinfo = partdesc->boundinfo;
- Assert(partclauses != NIL);
+ /* All partitions match if there are no clauses */
+ if (!partclauses)
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ /* Some functions called below modify this list */
+ clauses = list_copy(partclauses);
/*
* If relation is a partition itself, add its partition constraint
@@ -1719,19 +1721,17 @@ get_partitions_from_clauses(Relation relation, int rt_index,
* set of selected partitions for a query whose clauses select a key space
* bigger than the partition's.
*/
- if (partition_bound_has_default(boundinfo) &&
- (partconstr = RelationGetPartitionQual(relation)) != NIL)
+ if (partition_bound_has_default(boundinfo))
{
- partconstr = (List *) expression_planner((Expr *) partconstr);
+ List *partqual = RelationGetPartitionQual(relation);
- /* Be careful not to modify the input list. */
- partclauses = list_concat(list_copy(partclauses), partconstr);
- }
+ partqual = (List *) expression_planner((Expr *) partqual);
- result = get_partitions_from_clauses_recurse(relation, rt_index,
- partclauses);
+ clauses = list_concat(clauses, partqual);
+ }
- return result;
+ return get_partitions_from_clauses_recurse(relation, rt_index,
+ clauses);
}
/* Module-local functions */
@@ -1771,29 +1771,35 @@ get_partitions_from_clauses_recurse(Relation relation, int rt_index,
return NULL;
result = get_partitions_for_keys(relation, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we got
+ * an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
}
else
{
+ /*
+ * We found nothing useful to indicate which partitions might need to
+ * be scanned. Perhaps we'll find something below that indicates
+ * which ones won't need to be scanned.
+ */
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
result = bms_add_range(NULL, 0, partdesc->nparts - 1);
}
- /*
- * No point in trying to look at other conjunctive clauses, if we got
- * an empty set in the first place.
- */
- if (bms_is_empty(result))
- return NULL;
-
/* Select partitions by applying the clauses containing <> operators. */
if (ne_clauses)
{
Bitmapset *ne_clause_parts;
- ne_clause_parts = get_partitions_excluded_by(relation, ne_clauses);
+ ne_clause_parts = get_partitions_excluded_by_ne_clauses(relation,
+ ne_clauses);
- /* Remove any matched partitions */
+ /* Remove any partitions we found to not be needed */
result = bms_del_members(result, ne_clause_parts);
bms_free(ne_clause_parts);
}
@@ -1820,14 +1826,14 @@ get_partitions_from_clauses_recurse(Relation relation, int rt_index,
}
/*
- * get_partitions_excluded_by
+ * get_partitions_excluded_by_ne_clauses
*
* Returns a Bitmapset of partition indexes of any partition that can safely
* be removed due to 'ne_clauses' containing not-equal clauses for all
* possible values that the partition can contain.
*/
static Bitmapset *
-get_partitions_excluded_by(Relation relation, List *ne_clauses)
+get_partitions_excluded_by_ne_clauses(Relation relation, List *ne_clauses)
{
ListCell *lc;
Bitmapset *excluded_parts = NULL;
@@ -1979,45 +1985,47 @@ get_partitions_from_or_clause_args(Relation relation, int rt_index,
/* Match partition key (partattno/partexpr) to an expression (expr). */
#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
- ((partattno) != 0 ?\
- (IsA((expr), Var) &&\
- ((Var *) (expr))->varattno == (partattno)) :\
+ ((partattno) != 0 ? \
+ (IsA((expr), Var) && \
+ ((Var *) (expr))->varattno == (partattno)) : \
equal((expr), (partexpr)))
+#define COLLATION_MATCH(partcoll, exprcoll) \
+ (!OidIsValid(partcoll) || (partcoll) == (exprcoll))
+
/*
* classify_partition_bounding_keys
- * Analyzes partition clauses to collect the equality key or minimum and
- * maximum bounding keys using which to look up partitions of relation.
- * Also collects information about the nullness of the individual
- * partition key columns as the partitions may have certain properties
- * with respect to null values. Keys and nullness information are stored
- * in the output argument *keys.
+ * Analyzes 'clauses' clauses to collect the equality key, minimum,
+ * maximum bounding keys. We also collect nullability properties along
+ * the way. All these are then used to determine the minimum set of
+ * partitions that need to be scanned to find all records matching
+ * the clause list.
*
- * Clauses in the provided list are assumed to be implicitly ANDed, each of
- * which is known to match some partition key column. They're mapped to the
- * individual key columns and for each column, we find constant values that
- * are compared to the column using operators that are compatible with
- * partitioning. For example, if there is a clause a = 4 where a is a
- * partition key column, then 4 is stored as the equality key if = is
- * partitioning equality operator. If there are clauses a > 1 and a < 5, then
- * 1 and 5 are stored as the minimum and maximum bounding keys, if > and < are
- * partitioning less and greater operators, respectively. If there are
- * multiple clauses addressing a given column, we first try to check if they
- * are mutually contradictory and set *constfalse if so. For example, if there
- * are clauses a = 1 and a = 2 in the list, then clearly both will never be
- * true. Similarly for a > 1 and a < 0. For clauses containing ordering
+ * Clauses found to match a partition are are analyzed to determine if the
+ * clause is useful for partition elimination. For this to work the value
+ * being compared to the partition key must be a known value, e.g. a Const.
+ * We attempt to determine the narrowest range of values that will match by
+ * collecting and storing values that further narrow the range of the possible
+ * partitions to scan. For example, if x is a partition key and we see a
+ * clause such as x = 4, 4 is stored as the equality key if = is partitioning
+ * equality operator. If there are clauses x > 1 and x < 5, then 1 and 5 are
+ * stored as the minimum and maximum bounding keys, respectively, providing
+ * that > and < are partitioning less and greater operators, If there are
+ * multiple clauses matching a given column, we first try to check if they are
+ * mutually contradictory and, if so set *constfalse to true. For example, if
+ * there are clauses x = 1 and x = 2 in the list, then clearly both will never
+ * be true. Similarly for x > 1 and x < 0. For clauses containing ordering
* operators that are non-contradictory, we try to find the one that is the
- * most restrictive and discard others. For example, of a > 1, a > 2, and
- * a >= 5, the last one is the most restrictive and so 5 is the best minimum
+ * most restrictive and discard others. For example, of x > 1, x > 2, and
+ * x >= 5, the last one is the most restrictive and so 5 is the best minimum
* bound (which also happens to be inclusive), so it is kept while discarding
- * both a > 1 and a > 2.
+ * both x > 1 and x > 2.
*
- * For multi-column keys, an equality key needs to contain values corresponding
- * to *all* partition key columns in the range patitioning case, whereas it's
- * not necessary for hash partitioning. Actually, the latter requires that
- * the remaining columns are covered by IS NULL clauses, but that's not checked
- * in this function. Minimum and maximum bound keys are allowed to contain
- * values for only a prefix partition key columns.
+ * For RANGE partitioning we do not need to match all partition keys. We may
+ * be able to eliminate some partitions with just a prefix of the partition
+ * keys. HASH partitioning does require all keys are matched to with at least
+ * some combinations of equality clauses and IS NULL clauses. LIST partitions
+ * don't support multiple partition keys.
*
* Certain kinds of clauses are not immediately handled within this function
* and are instead returned to the caller for further processing. That
@@ -2025,10 +2033,10 @@ get_partitions_from_or_clause_args(Relation relation, int rt_index,
* generated from ScalarArrayOpExpr clauses in the input list that have useOr
* set to true), which are returned to the caller in *or_clauses and clauses
* containing a <> operator (whose negator is a valid *list* partitioning
- * equality operator), which are returned to the caller to in *ne_clauses.
+ * equality operator), which are returned to the caller via *ne_clauses.
*
- * True is returned if *keys contains valid information upon return or if
- * *constfalse is set to true.
+ * True is returned if *keys contains use information and also if *constfalse
+ * has been set to true.
*/
static bool
classify_partition_bounding_keys(Relation relation, List *clauses,
@@ -2048,7 +2056,7 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
bool need_next_eq,
need_next_min,
need_next_max;
- int n_keynullness = 0;
+ bool got_nullcheck = false;
*or_clauses = NIL;
*ne_clauses = NIL;
@@ -2097,22 +2105,21 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
{
Oid partopfamily = partkey->partopfamily[i];
AttrNumber partattno = partkey->partattrs[i];
+ Oid partcoll = partkey->partcollation[i];
Expr *partexpr = NULL;
PartClause *pc;
+ Oid commutator = InvalidOid;
/*
- * A non-zero partattno refers to a simple column reference that
- * will be matched against varattno of a Var appearing the clause.
- * partattno == 0 refers to arbitrary expressions, which get the
- * current one from PartitionKey.
+ * A zero attno means the partition key is an expression, so grab
+ * the next expression from the list.
*/
if (partattno == 0)
{
if (partexprs_item == NULL)
elog(ERROR, "wrong number of partition key expressions");
- /* Copy to avoid overwriting the relcache's content. */
- partexpr = copyObject(lfirst(partexprs_item));
+ partexpr = (Expr *) lfirst(partexprs_item);
/*
* Expressions stored in PartitionKey in the relcache all
@@ -2121,7 +2128,11 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
* correctly matched to the expressions coming from the query.
*/
if (rt_index != 1)
+ {
+ /* make a copy so as now to overwrite the relcache */
+ partexpr = (Expr *) copyObject(partexpr);
ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+ }
partexprs_item = lnext(partexprs_item);
}
@@ -2140,15 +2151,43 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
rightop = (Expr *) get_rightop(clause);
if (IsA(rightop, RelabelType))
rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches the partition key */
if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
constexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ {
constexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ continue;
+ }
else
/* Clause does not match this partition key. */
continue;
/*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
* Handle cases where the clause's operator does not belong to
* the partitioning operator family. We currently handle two
* such cases: 1. Operators named '<>' are not listed in any
@@ -2190,30 +2229,23 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
continue;
}
- pc = palloc0(sizeof(PartClause));
+ pc = (PartClause *) palloc0(sizeof(PartClause));
pc->constarg = constexpr;
/*
- * Flip the left and right args if we have to, because the
- * code which extract the constant value to use for
- * partition-pruning expects to find it as the rightop of the
- * clause. (See below in this function.)
+ * If commutator is set to a valid Oid then we'll need to swap
+ * the left and right operands. Later code requires that the
+ * partkey is on the left side.
*/
- if (constexpr == rightop)
+ if (!OidIsValid(commutator))
pc->op = opclause;
else
{
OpExpr *commuted;
- Oid commutator = get_commutator(opclause->opno);
- /*
- * Caller must have made sure to check that the commutator
- * indeed exists.
- */
- Assert(OidIsValid(commutator));
commuted = (OpExpr *) copyObject(opclause);
commuted->opno = commutator;
- commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->opfuncid = get_opcode(commutator);
commuted->args = list_make2(rightop, leftop);
pc->op = commuted;
}
@@ -2221,7 +2253,7 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
/*
* We don't turn a <> operator clause into a key right away.
* Instead, the caller will hand over such clauses to
- * get_partitions_excluded_by().
+ * get_partitions_excluded_by_ne_clauses().
*/
if (is_ne_listp)
*ne_clauses = lappend(*ne_clauses, pc);
@@ -2255,13 +2287,32 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
ListCell *lc1;
bool negated = false;
- /* Clause does not match this partition key. */
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
continue;
/*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ continue;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
* In case of NOT IN (..), we get a '<>', which while not
* listed as part of any operator family, we are able to
* handle it if its negator is indeed a part of the
@@ -2343,10 +2394,10 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
elem_clauses = NIL;
foreach(lc1, elem_exprs)
{
- Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *rightop = (Expr *) lfirst(lc1);
Expr *elem_clause;
- if (rightop->constisnull)
+ if (IsA(rightop, Const) && ((Const *) rightop)->constisnull)
{
NullTest *nulltest = makeNode(NullTest);
@@ -2399,6 +2450,7 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
{
if (nulltest->nulltesttype == IS_NULL)
{
+ /* check for conflicting IS NOT NULLs */
if (bms_is_member(i, keyisnotnull))
{
*constfalse = true;
@@ -2407,8 +2459,17 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
keyisnull = bms_add_member(keyisnull, i);
}
else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, keyisnull))
+ {
+ *constfalse = true;
+ return true;
+ }
+
keyisnotnull = bms_add_member(keyisnotnull, i);
- n_keynullness++;
+ }
+ got_nullcheck = true;
will_compute_keys = true;
}
}
@@ -2431,9 +2492,15 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
{
BooleanTest *btest = (BooleanTest *) clause;
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ continue;
+
leftop = btest->arg;
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
+
/* Clause does not match this partition key. */
if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
continue;
@@ -2450,6 +2517,7 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
: (Expr *) get_notclausearg((Expr *) clause);
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
+
/* Clause does not match this partition key. */
if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
continue;
@@ -2616,7 +2684,7 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
keys->keyisnotnull = keyisnotnull;
return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
- keys->n_maxkeys > 0 || n_keynullness > 0);
+ keys->n_maxkeys > 0 || got_nullcheck);
}
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e7c7a6e..51648c8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -858,54 +858,24 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
/*
* get_append_rel_partitions
- * Return the list of partitions of rel that pass the clauses mentioned
- * in rel->baserestrictinfo. An empty list is returned if no matching
- * partitions were found.
- *
- * Returned list contains the AppendRelInfos of chosen partitions.
+ * Returns a List of AppendRelInfo belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
*/
static List *
get_append_rel_partitions(PlannerInfo *root,
RelOptInfo *rel,
RangeTblEntry *rte)
{
- List *partclauses;
- bool contains_const,
- constfalse;
- List *result = NIL;
- int i;
- Relation parent;
- PartitionDesc partdesc;
+ Relation partrel;
Bitmapset *partindexes;
+ List *result = NIL;
+ int i;
- /*
- * Get the clauses that match the partition key. It's also a good idea
- * to check if the matched clauses contain constant values that can be
- * used for pruning and go to get_partitions_from_clauses() only if so.
- * If rel->baserestrictinfo might contain mutually contradictory clauses,
- * also find out about that.
- */
- partclauses = match_clauses_to_partkey(root, rel, rel->baserestrictinfo,
- &contains_const, &constfalse);
+ partrel = heap_open(rte->relid, NoLock);
- /* We're done here. */
- if (constfalse)
- return NIL;
-
- parent = heap_open(rte->relid, NoLock);
- partdesc = RelationGetPartitionDesc(parent);
-
- if (partclauses != NIL && contains_const)
- partindexes = get_partitions_from_clauses(parent, rel->relid,
- partclauses);
- else
- {
- /*
- * There are no clauses that are useful to prune any partitions, so
- * scan all partitions.
- */
- partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
- }
+ partindexes = get_partitions_from_clauses(partrel, rel->relid,
+ rel->baserestrictinfo);
/* Fetch the partition appinfos. */
i = -1;
@@ -914,328 +884,22 @@ get_append_rel_partitions(PlannerInfo *root,
AppendRelInfo *appinfo = rel->part_appinfos[i];
#ifdef USE_ASSERT_CHECKING
- RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+ PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
+ RangeTblEntry *childrte;
+
+ childrte = planner_rt_fetch(appinfo->child_relid, root);
/*
* Must be the intended child's RTE here, because appinfos are ordered
* the same way as partitions in the partition descriptor.
*/
- Assert(partdesc->oids[i] == rte->relid);
+ Assert(partdesc->oids[i] == childrte->relid);
#endif
+
result = lappend(result, appinfo);
}
- heap_close(parent, NoLock);
-
- return result;
-}
-
-#define PartCollMatchesExprColl(partcoll, exprcoll) \
- ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
-
-/*
- * match_clauses_to_partkey
- * Match clauses with rel's partition key
- *
- * Returned list contains clauses matched to the partition key columns and
- * *contains_const and *constfalse are set as described below.
- *
- * For an individual clause to match with a partition key column, the clause
- * must be an operator clause of the form (partkey op const) or (const op
- * partkey); the latter only if a suitable commutator exists. Furthermore,
- * the operator must be strict and its input collation must match the partition
- * collation. The aforementioned "const" means any expression that doesn't
- * involve a volatile function or a Var of this relation. We allow Vars
- * belonging to other relations (for example, if the clause is a join clause),
- * but they are treated as parameters whose values are not known now, so cannot
- * be used for partition pruning right within the planner. It's the
- * responsibility of higher code levels to manage restriction and join clauses
- * appropriately. If a NullTest against a partition key is encountered, it's
- * added to the result as well.
- *
- * *contains_const is set if at least one matched clauses contains the constant
- * operand or is a Nullness test. *constfalse is set if the input list
- * contains a pseudo-constant RestrictInfo with false value.
- */
-static List *
-match_clauses_to_partkey(PlannerInfo *root,
- RelOptInfo *rel,
- List *clauses,
- bool *contains_const,
- bool *constfalse)
-{
- PartitionScheme partscheme = rel->part_scheme;
- List *result = NIL;
- ListCell *lc;
-
- *contains_const = false;
- *constfalse = false;
-
- Assert (partscheme != NULL);
-
- /* Make a copy, because we may scribble on it below. */
- clauses = list_copy(clauses);
-
- foreach(lc, clauses)
- {
- Node *member = lfirst(lc);
- Expr *clause;
- int i;
-
- if (IsA(member, RestrictInfo))
- {
- RestrictInfo *rinfo = (RestrictInfo *) member;
-
- clause = rinfo->clause;
- if (rinfo->pseudoconstant &&
- (IsA(clause, Const) &&
- ((((Const *) clause)->constisnull) ||
- !DatumGetBool(((Const *) clause)->constvalue))))
- {
- *constfalse = true;
- return NIL;
- }
- }
- else
- clause = (Expr *) member;
-
- /*
- * For a BoolExpr, we should try to match each of its args with the
- * partition key as described below for each type.
- */
- if (IsA(clause, BoolExpr))
- {
- if (or_clause((Node *) clause))
- {
- /*
- * For each of OR clause's args, call this function
- * recursively with a given arg as the only member in the
- * input list and see if it's returned as matching the
- * partition key. Add the OR clause to the result iff at
- * least one of its args contain a matching clause.
- */
- BoolExpr *orclause = (BoolExpr *) clause;
- ListCell *lc1;
- bool arg_matches_key = false,
- matched_arg_contains_const = false,
- all_args_constfalse = true;
-
- foreach (lc1, orclause->args)
- {
- Node *arg = lfirst(lc1);
- bool contains_const1,
- constfalse1;
-
- if (match_clauses_to_partkey(root, rel, list_make1(arg),
- &contains_const1,
- &constfalse1) != NIL)
- {
- arg_matches_key = true;
- matched_arg_contains_const = contains_const1;
- }
-
- /* We got at least one arg that is not constant false. */
- if (!constfalse1)
- all_args_constfalse = false;
- }
-
- if (arg_matches_key)
- {
- result = lappend(result, clause);
- *contains_const = matched_arg_contains_const;
- }
-
- /* OR clause is "constant false" if all of its args are. */
- *constfalse = all_args_constfalse;
- continue;
- }
- else if (and_clause((Node *) clause))
- {
- /*
- * Since the clause is itself implicitly ANDed with other
- * clauses in the input list, queue the args to be processed
- * later as if they were part of the original input list.
- */
- clauses = list_concat(clauses,
- list_copy(((BoolExpr *) clause)->args));
- continue;
- }
-
- /* Fall-through for a NOT clause, which is handled below. */
- }
-
- for (i = 0; i < partscheme->partnatts; i++)
- {
- Node *partkey = linitial(rel->partexprs[i]);
- Oid partopfamily = partscheme->partopfamily[i],
- partcoll = partscheme->partcollation[i];
-
- /*
- * Check if the clauses matches the partition key and add it to
- * the result list if other things such as operator input
- * collation, strictness, etc. look fine.
- */
- if (is_opclause(clause))
- {
- Expr *constexpr,
- *leftop,
- *rightop;
- Relids constrelids;
- Oid expr_op,
- expr_coll;
-
- leftop = (Expr *) get_leftop(clause);
- rightop = (Expr *) get_rightop(clause);
- expr_op = ((OpExpr *) clause)->opno;
- expr_coll = ((OpExpr *) clause)->inputcollid;
-
- if (IsA(leftop, RelabelType))
- leftop = ((RelabelType *) leftop)->arg;
- if (IsA(rightop, RelabelType))
- rightop = ((RelabelType *) rightop)->arg;
-
- if (equal(leftop, partkey))
- {
- constexpr = rightop;
- constrelids = pull_varnos((Node *) rightop);
- }
- else if (equal(rightop, partkey))
- {
- constexpr = leftop;
- constrelids = pull_varnos((Node *) leftop);
- expr_op = get_commutator(expr_op);
-
- /*
- * If no commutator exists, cannot flip the qual's args,
- * so give up.
- */
- if (!OidIsValid(expr_op))
- continue;
- }
- else
- /* Neither argument matches the partition key. */
- continue;
-
- /*
- * Useless if what we're thinking of as a constant is actually
- * a Var coming from this relation.
- */
- if (bms_is_member(rel->relid, constrelids))
- continue;
-
- /*
- * Also, useless, if the clause's collation is different from
- * the partitioning collation.
- */
- if (!PartCollMatchesExprColl(partcoll, expr_coll))
- continue;
-
- /*
- * Only allow strict operators to think sanely about the
- * behavior with null arguments.
- */
- if (!op_strict(expr_op))
- continue;
-
- /* Useless if the "constant" can change its value. */
- if (contain_volatile_functions((Node *) constexpr))
- continue;
-
- /*
- * Everything seems to be fine, so add it to the list of
- * clauses we will use for pruning.
- */
- result = lappend(result, clause);
-
- if (!*contains_const)
- *contains_const = IsA(constexpr, Const);
- }
- else if (IsA(clause, ScalarArrayOpExpr))
- {
- ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
- Oid saop_op = saop->opno;
- Oid saop_coll = saop->inputcollid;
- Node *leftop = (Node *) linitial(saop->args),
- *rightop = (Node *) lsecond(saop->args);
-
- if (IsA(leftop, RelabelType))
- leftop = (Node *) ((RelabelType *) leftop)->arg;
- if (!equal(leftop, partkey))
- continue;
-
- /* Check if saop_op is compatible with partitioning. */
- if (!op_strict(saop_op))
- continue;
-
- /* Useless if the "constant" can change its value. */
- if (contain_volatile_functions((Node *) rightop))
- continue;
-
- /*
- * Also, useless, if the clause's collation is different from
- * the partitioning collation.
- */
- if (!PartCollMatchesExprColl(partcoll, saop_coll))
- continue;
-
- /* OK to add to the result. */
- result = lappend(result, clause);
- if (IsA(eval_const_expressions(root, rightop), Const))
- *contains_const = true;
- else
- *contains_const = false;
- }
- else if (IsA(clause, NullTest))
- {
- NullTest *nulltest = (NullTest *) clause;
- Node *arg = (Node *) nulltest->arg;
-
- if (equal(arg, partkey))
- {
- result = lappend(result, nulltest);
- /* A Nullness test can be used right away. */
- *contains_const = true;
- }
- }
- /*
- * Certain Boolean conditions have a special shape, which we
- * accept if the partitioning opfamily accepts Boolean conditions.
- */
- else if (IsBooleanOpfamily(partopfamily) &&
- (IsA(clause, BooleanTest) ||
- IsA(clause, Var) || not_clause((Node *) clause)))
- {
- /*
- * Only accept those for pruning that appear to be
- * IS [NOT] TRUE/FALSE.
- */
- if (IsA(clause, BooleanTest))
- {
- BooleanTest *btest = (BooleanTest *) clause;
- Expr *arg = btest->arg;
-
- if (btest->booltesttype != IS_UNKNOWN &&
- btest->booltesttype != IS_NOT_UNKNOWN &&
- equal((Node *) arg, partkey))
- result = lappend(result, clause);
- }
- else if (IsA(clause, Var))
- {
- if (equal((Node *) clause, partkey))
- result = lappend(result, clause);
- }
- else
- {
- Node *arg = (Node *) get_notclausearg((Expr *) clause);
-
- if (equal(arg, partkey))
- result = lappend(result, clause);
- }
-
- *contains_const = true;
- }
- }
- }
+ heap_close(partrel, NoLock);
return result;
}
Hi David.
On Wed, Jan 17, 2018 at 6:19 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 17 January 2018 at 17:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
6. Which brings me to; why do we need match_clauses_to_partkey at all?
classify_partition_bounding_keys seems to do all the work
match_clauses_to_partkey does, plus more. Item #3 above is caused by
an inconsistency between these functions. What benefit does
match_clauses_to_partkey give? I might understand if you were creating
list of clauses matching each partition key, but you're just dumping
everything in one big list which causes
classify_partition_bounding_keys() to have to match each clause to a
partition key again, and classify_partition_bounding_keys is even
coded to ignore clauses that don't' match any key, so it makes me
wonder what is match_clauses_to_partkey actually for?I started to look at this and ended up shuffling the patch around a
bit to completely remove the match_clauses_to_partkey function.I also cleaned up some of the comments and shuffled some fields around
in some of the structs to shrink them down a bit.All up, this has saved 268 lines of code in the patch.
src/backend/catalog/partition.c | 296 ++++++++++++++++-----------
src/backend/optimizer/path/allpaths.c | 368 ++--------------------------------
2 files changed, 198 insertions(+), 466 deletions(-)It's had very minimal testing. Really I've only tested that the
regression tests pass.I also fixed up the bad assumption that IN lists will contain Consts
only which hopefully fixes the crash I reported earlier.I saw you'd added a check to look for contradicting IS NOT NULL
clauses when processing an IS NULL clause, but didn't do anything for
the opposite case. I added code for this so it behaves the same
regardless of the clause order.Can you look at my changes and see if I've completely broken anything?
Thanks for the patch. I applied the patch and see that it didn't
break any tests, although haven't closely reviewed the code yet.
I'm concerned that after your patch to remove
match_clauses_to_partkey(), we'd be doing more work than necessary in
some cases. For example, consider the case of using run-time pruning
for nested loop where the inner relation is a partitioned table. With
the old approach, get_partitions_from_clauses() would only be handed
the clauses that are known to match the partition keys (which most
likely is fewer than all of the query's clauses), so
get_partitions_from_clauses() doesn't have to do the work of filtering
non-partition clauses every time (that is, for every outer row).
That's why I had decided to keep that part in the planner.
Thanks,
Amit
On 17 January 2018 at 23:48, Amit Langote <amitlangote09@gmail.com> wrote:
I'm concerned that after your patch to remove
match_clauses_to_partkey(), we'd be doing more work than necessary in
some cases. For example, consider the case of using run-time pruning
for nested loop where the inner relation is a partitioned table. With
the old approach, get_partitions_from_clauses() would only be handed
the clauses that are known to match the partition keys (which most
likely is fewer than all of the query's clauses), so
get_partitions_from_clauses() doesn't have to do the work of filtering
non-partition clauses every time (that is, for every outer row).
That's why I had decided to keep that part in the planner.
That might be better served by splitting
classify_partition_bounding_keys() into separate functions, the first
function would be in charge of building keyclauses_all. That way the
remaining work during the executor would never need to match clauses
to a partition key as they'd be in lists dedicated to each key.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 18 January 2018 at 00:13, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 17 January 2018 at 23:48, Amit Langote <amitlangote09@gmail.com> wrote:
I'm concerned that after your patch to remove
match_clauses_to_partkey(), we'd be doing more work than necessary in
some cases. For example, consider the case of using run-time pruning
for nested loop where the inner relation is a partitioned table. With
the old approach, get_partitions_from_clauses() would only be handed
the clauses that are known to match the partition keys (which most
likely is fewer than all of the query's clauses), so
get_partitions_from_clauses() doesn't have to do the work of filtering
non-partition clauses every time (that is, for every outer row).
That's why I had decided to keep that part in the planner.That might be better served by splitting
classify_partition_bounding_keys() into separate functions, the first
function would be in charge of building keyclauses_all. That way the
remaining work during the executor would never need to match clauses
to a partition key as they'd be in lists dedicated to each key.
I've attached another delta against your v20 patch which does this.
It's very rough for now and I've only checked that it passes the
regression test so far.
It will need some cleanup work, but I'd be keen to know what you think
of the general idea. I've not fully worked out how run-time pruning
will use this as it'll need another version of
get_partitions_from_clauses but passes in a PartScanClauseInfo
instead, and does not call extract_partition_key_clauses. That area
probably needs some shuffling around so that does not end up a big
copy and paste of all that new logic.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
faster_partition_prune_v20_delta_drowley_v2.patchapplication/octet-stream; name=faster_partition_prune_v20_delta_drowley_v2.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 974febb..10124d3 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -155,7 +155,6 @@ typedef struct PartitionRangeBound
*/
typedef struct PartitionBoundCmpArg
{
- bool is_bound;
union
{
PartitionListValue *lbound;
@@ -165,6 +164,7 @@ typedef struct PartitionBoundCmpArg
Datum *datums;
int ndatums;
+ bool is_bound;
} PartitionBoundCmpArg;
/*
@@ -177,7 +177,7 @@ typedef struct PartClause
Expr *constarg;
/* cached info. */
- bool valid_cache; /* Is the following information initialized? */
+ bool valid_cache; /* Are the following fields populated? */
int op_strategy;
Oid op_subtype;
FmgrInfo op_func;
@@ -195,49 +195,64 @@ typedef enum PartOpStrategy
} PartOpStrategy;
/*
+ */
+typedef struct PartScanClauseInfo
+{
+ /* Lists of clauses indexed by partition key */
+ List *clauses[PARTITION_MAX_KEYS];
+
+ List *or_clauses; /* List of clauses found in an OR branch */
+ List *ne_clauses; /* Clauses in the form partkey <> Expr */
+
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* Stored data is known to contain impossible contradictions */
+ bool constfalse;
+} PartScanClauseInfo;
+
+/*
* PartScanKeyInfo
* Information about partition look up keys to be passed to
* get_partitions_for_keys()
*
- * This information is extracted from the query's mutually conjunctive operator
- * clauses, each of whose variable argument is matched to a partition key and
- * operator is checked to be contained in the corresponding column's partition
- * operator family.
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Properties found are cached and are indexed by the
+ * partition key index.
*/
typedef struct PartScanKeyInfo
{
/*
- * Equality look up key. Values in the following array appear in no
- * particular order (unlike minkeys and maxkeys below which must appear in
- * the same order as the partition key columns). n_eqkeys must be equal to
- * the number of partition keys to be valid (except in the case of hash
- * partitioning where that's not required). When set, minkeys and maxkeys
- * are ignored.
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
*/
Datum eqkeys[PARTITION_MAX_KEYS];
- int n_eqkeys;
/*
- * Lower and upper bounds on a sequence of selected partitions. Values in
- * the following arrays must appear in the same order as the partition key
- * columns and may contain values for only a prefix of the partition key
- * columns. If *_incl is true then the corresponding bound is inclusive
- * and hence the partition into which the bound falls is to be included in
- * the set of selected partitions.
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
*/
Datum minkeys[PARTITION_MAX_KEYS];
- int n_minkeys;
- bool min_incl;
-
Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses found for the corresponding partition
+ * are inclusive of the stored value or not.
+ */
+ bool min_incl;
bool max_incl;
/*
- * Information about nullness of partition keys, either specified
+ * Information about nullness of the partition keys, either specified
* explicitly in the query (in the form of a IS [NOT] NULL clause) or
- * implied due to the assumption of strictness of the partitioning
- * operators.
+ * implied from strict clauses matching the partition key.
*/
Bitmapset *keyisnull;
Bitmapset *keyisnotnull;
@@ -293,17 +308,15 @@ PG_FUNCTION_INFO_V1(satisfies_hash_partition);
static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
int rt_index, List *clauses);
-static Bitmapset *get_partitions_excluded_by(Relation relation,
- List *ne_clauses);
+static Bitmapset *get_partitions_excluded_by_ne_clauses(Relation relation,
+ List *ne_clauses);
static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
int rt_index, List *or_clause_args);
-static bool classify_partition_bounding_keys(Relation relation, List *clauses,
- int rt_index,
- PartScanKeyInfo *keys, bool *constfalse,
- List **or_clauses, List **ne_clauses);
-static void remove_redundant_clauses(PartitionKey partkey,
- int partkeyidx, List *all_clauses,
- List **result, bool *constfalse);
+static bool extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index, PartScanClauseInfo *partclauses);
+static bool extract_bounding_datums(PartitionKey partkey,
+ PartScanClauseInfo *partclauses,
+ PartScanKeyInfo *keys);
static bool partition_cmp_args(PartitionKey key, int partkeyidx,
PartClause *op, PartClause *leftarg, PartClause *rightarg,
bool *result);
@@ -311,6 +324,8 @@ static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *op,
bool *incl);
static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
Expr *expr, Datum *value);
+static void remove_redundant_clauses(PartitionKey partkey,
+ PartScanClauseInfo *partclauses);
static Bitmapset *get_partitions_for_keys(Relation rel,
PartScanKeyInfo *keys);
static Bitmapset *get_partitions_for_keys_hash(Relation rel,
@@ -1692,25 +1707,30 @@ get_partition_qual_relid(Oid relid)
/*
* get_partitions_from_clauses
- * Determine the set of partitions of 'relation' that will satisfy all
- * the clauses contained in 'partclauses'
+ * Determine all partitions of 'relation' that could possibly contain a
+ * record that matches 'partclauses'
*
- * Outputs:
- * A Bitmapset containing indexes of all selected partitions.
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
*/
Bitmapset *
get_partitions_from_clauses(Relation relation, int rt_index,
List *partclauses)
{
- Bitmapset *result;
- List *partconstr;
- PartitionDesc partdesc = RelationGetPartitionDesc(relation);
- PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo;
+ List *clauses;
- Assert(partclauses != NIL);
+ /* All partitions match if there are no clauses */
+ if (!partclauses)
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ /* Some functions called below modify this list */
+ clauses = list_copy(partclauses);
+ boundinfo = partdesc->boundinfo;
/*
- * If relation is a partition itself, add its partition constraint
+ * If relation is a sub-partitioned table, add its partition constraint
* clauses to the list of clauses to use for partition pruning. This
* is done to facilitate correct decision regarding the default
* partition. Adding the partition constraint clauses to the list helps
@@ -1719,19 +1739,20 @@ get_partitions_from_clauses(Relation relation, int rt_index,
* set of selected partitions for a query whose clauses select a key space
* bigger than the partition's.
*/
- if (partition_bound_has_default(boundinfo) &&
- (partconstr = RelationGetPartitionQual(relation)) != NIL)
+ if (partition_bound_has_default(boundinfo))
{
- partconstr = (List *) expression_planner((Expr *) partconstr);
+ List *partqual = RelationGetPartitionQual(relation);
- /* Be careful not to modify the input list. */
- partclauses = list_concat(list_copy(partclauses), partconstr);
- }
+ partqual = (List *) expression_planner((Expr *) partqual);
- result = get_partitions_from_clauses_recurse(relation, rt_index,
- partclauses);
+ /* Fix Vars to have the desired varno */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partqual, 1, rt_index, 0);
- return result;
+ clauses = list_concat(clauses, partqual);
+ }
+
+ return get_partitions_from_clauses_recurse(relation, rt_index, clauses);
}
/* Module-local functions */
@@ -1747,59 +1768,75 @@ static Bitmapset *
get_partitions_from_clauses_recurse(Relation relation, int rt_index,
List *clauses)
{
- Bitmapset *result;
- PartScanKeyInfo keys;
- bool constfalse;
- List *or_clauses,
- *ne_clauses;
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartScanClauseInfo partclauses;
+ Bitmapset *result;
+ PartScanKeyInfo keys;
ListCell *lc;
- /*
- * Try to reduce the set of clauses into a form that
- * get_partitions_for_keys() can work with.
- */
- if (classify_partition_bounding_keys(relation, clauses, rt_index,
- &keys, &constfalse,
- &or_clauses, &ne_clauses))
+ /* Populate partclauses from the clause list */
+ if (extract_partition_key_clauses(partkey, clauses, rt_index, &partclauses))
{
/*
- * classify_partition_bounding_keys() may have found clauses marked
- * pseudo-constant that are false that the planner didn't or it may
- * have itself found contradictions among clauses.
+ * No partitions to scan if extract_partition_key_clauses found some
+ * clause contradiction.
*/
- if (constfalse)
+ if (partclauses.constfalse)
+ return NULL;
+
+ /* collapse clauses down to the most restrictive set */
+ remove_redundant_clauses(partkey, &partclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauses.constfalse)
return NULL;
- result = get_partitions_for_keys(relation, &keys);
+ if (extract_bounding_datums(partkey, &partclauses, &keys))
+ {
+ result = get_partitions_for_keys(relation, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we got
+ * an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * We found nothing useful to indicate which partitions might need to
+ * be scanned. Perhaps we'll find something below that indicates
+ * which ones won't need to be scanned.
+ */
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
}
else
{
- PartitionDesc partdesc = RelationGetPartitionDesc(relation);
-
+ /*
+ * no useful key clauses found, but we might still be able to
+ * eliminate some partitions with ne_clauses or or_clauses.
+ */
result = bms_add_range(NULL, 0, partdesc->nparts - 1);
}
- /*
- * No point in trying to look at other conjunctive clauses, if we got
- * an empty set in the first place.
- */
- if (bms_is_empty(result))
- return NULL;
-
/* Select partitions by applying the clauses containing <> operators. */
- if (ne_clauses)
+ if (partclauses.ne_clauses)
{
Bitmapset *ne_clause_parts;
- ne_clause_parts = get_partitions_excluded_by(relation, ne_clauses);
+ ne_clause_parts = get_partitions_excluded_by_ne_clauses(relation,
+ partclauses.ne_clauses);
- /* Remove any matched partitions */
+ /* Remove any partitions we found to not be needed */
result = bms_del_members(result, ne_clause_parts);
bms_free(ne_clause_parts);
}
/* Select partitions by applying OR clauses. */
- foreach(lc, or_clauses)
+ foreach(lc, partclauses.or_clauses)
{
BoolExpr *or = (BoolExpr *) lfirst(lc);
Bitmapset *or_parts;
@@ -1820,14 +1857,14 @@ get_partitions_from_clauses_recurse(Relation relation, int rt_index,
}
/*
- * get_partitions_excluded_by
+ * get_partitions_excluded_by_ne_clauses
*
* Returns a Bitmapset of partition indexes of any partition that can safely
* be removed due to 'ne_clauses' containing not-equal clauses for all
* possible values that the partition can contain.
*/
static Bitmapset *
-get_partitions_excluded_by(Relation relation, List *ne_clauses)
+get_partitions_excluded_by_ne_clauses(Relation relation, List *ne_clauses)
{
ListCell *lc;
Bitmapset *excluded_parts = NULL;
@@ -1903,10 +1940,9 @@ get_partitions_excluded_by(Relation relation, List *ne_clauses)
/*
* Now compare the counts and eliminate any partition for which we found
- * clauses for all its permitted values. We must be careful here not to
- * eliminate the default partition, but the condition below that we must
- * have found at least 1 datum will ensure that, because in the default
- * partition's case, both arrays will contain zero.
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
*/
for (i = 0; i < partdesc->nparts; i++)
{
@@ -1949,15 +1985,15 @@ get_partitions_from_or_clause_args(Relation relation, int rt_index,
Bitmapset *arg_partset;
/*
- * It's possible that this clause is never true for this relation
- * due to the latter's partition constraint, which means we must
- * not add its partitions to or_partset. But the clause may not
- * contain this relation's partition key expressions (instead the
- * parent's), so we could not depend on just calling
- * get_partitions_from_clauses_recurse(relation, ...) to determine
- * that the clause indeed prunes all of the relation's partition.
- *
- * Use predicate refutation proof instead.
+ * It's possible that this clause is never true for this relation due
+ * to it contradicting the partition's constraint. In this case we
+ * must not include any partitions for this OR clause. However, this
+ * OR clause may not contain any quals matching this partition table's
+ * partition key, it may contain some belonging to a parent partition
+ * though, so we may not have all the quals here required to make use
+ * of get_partitions_from_clauses_recurse to determine the correct set
+ * of partitions, so we'll just make use of predicate_refuted_by
+ * instead.
*/
if (partconstr)
{
@@ -1979,108 +2015,65 @@ get_partitions_from_or_clause_args(Relation relation, int rt_index,
/* Match partition key (partattno/partexpr) to an expression (expr). */
#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
- ((partattno) != 0 ?\
- (IsA((expr), Var) &&\
- ((Var *) (expr))->varattno == (partattno)) :\
+ ((partattno) != 0 ? \
+ (IsA((expr), Var) && \
+ ((Var *) (expr))->varattno == (partattno)) : \
equal((expr), (partexpr)))
+#define COLLATION_MATCH(partcoll, exprcoll) \
+ (!OidIsValid(partcoll) || (partcoll) == (exprcoll))
+
/*
- * classify_partition_bounding_keys
- * Analyzes partition clauses to collect the equality key or minimum and
- * maximum bounding keys using which to look up partitions of relation.
- * Also collects information about the nullness of the individual
- * partition key columns as the partitions may have certain properties
- * with respect to null values. Keys and nullness information are stored
- * in the output argument *keys.
- *
- * Clauses in the provided list are assumed to be implicitly ANDed, each of
- * which is known to match some partition key column. They're mapped to the
- * individual key columns and for each column, we find constant values that
- * are compared to the column using operators that are compatible with
- * partitioning. For example, if there is a clause a = 4 where a is a
- * partition key column, then 4 is stored as the equality key if = is
- * partitioning equality operator. If there are clauses a > 1 and a < 5, then
- * 1 and 5 are stored as the minimum and maximum bounding keys, if > and < are
- * partitioning less and greater operators, respectively. If there are
- * multiple clauses addressing a given column, we first try to check if they
- * are mutually contradictory and set *constfalse if so. For example, if there
- * are clauses a = 1 and a = 2 in the list, then clearly both will never be
- * true. Similarly for a > 1 and a < 0. For clauses containing ordering
- * operators that are non-contradictory, we try to find the one that is the
- * most restrictive and discard others. For example, of a > 1, a > 2, and
- * a >= 5, the last one is the most restrictive and so 5 is the best minimum
- * bound (which also happens to be inclusive), so it is kept while discarding
- * both a > 1 and a > 2.
+ * extract_partition_key_clauses
+ * Process 'clauses' to extract clause matching the partition key.
+ * This populates 'partclauses' with the set of clauses matching each
+ * key also also collects other useful clauses to assist in partition
+ * elimination, such as or clauses and not equal clauses. We also record
+ * which partitions keys we can prove are NULL or NOT NULL.
*
- * For multi-column keys, an equality key needs to contain values corresponding
- * to *all* partition key columns in the range patitioning case, whereas it's
- * not necessary for hash partitioning. Actually, the latter requires that
- * the remaining columns are covered by IS NULL clauses, but that's not checked
- * in this function. Minimum and maximum bound keys are allowed to contain
- * values for only a prefix partition key columns.
+ * We may also discover some contradition in the clauses which means that no
+ * partition can possibly match. In this case the function sets partclauses's
+ * 'constfalse' to true and returns true. In this case the caller should not
+ * assume the clauses have been fully processed as we abort as soon as we find
+ * a contradicting condition.
*
- * Certain kinds of clauses are not immediately handled within this function
- * and are instead returned to the caller for further processing. That
- * includes OR clauses (both those encountered in the input list and those
- * generated from ScalarArrayOpExpr clauses in the input list that have useOr
- * set to true), which are returned to the caller in *or_clauses and clauses
- * containing a <> operator (whose negator is a valid *list* partitioning
- * equality operator), which are returned to the caller to in *ne_clauses.
- *
- * True is returned if *keys contains valid information upon return or if
- * *constfalse is set to true.
+ * The function returns false if no useful key clauses were found.
*/
static bool
-classify_partition_bounding_keys(Relation relation, List *clauses,
- int rt_index,
- PartScanKeyInfo *keys, bool *constfalse,
- List **or_clauses,
- List **ne_clauses)
+extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index,
+ PartScanClauseInfo *partclauses)
{
- PartitionKey partkey = RelationGetPartitionKey(relation);
- int i;
- ListCell *lc;
- List *keyclauses_all[PARTITION_MAX_KEYS],
- *keyclauses[PARTITION_MAX_KEYS];
- bool will_compute_keys = false;
- Bitmapset *keyisnull = NULL,
- *keyisnotnull = NULL;
- bool need_next_eq,
- need_next_min,
- need_next_max;
- int n_keynullness = 0;
-
- *or_clauses = NIL;
- *ne_clauses = NIL;
- *constfalse = false;
- memset(keyclauses_all, 0, sizeof(keyclauses_all));
+ int i;
+ ListCell *lc;
+ bool got_useful_keys = false;
+
+ memset(partclauses, 0, sizeof(PartScanClauseInfo));
foreach(lc, clauses)
{
- Expr *clause;
+ Expr *clause = (Expr *) lfirst(lc);
ListCell *partexprs_item;
- if (IsA(lfirst(lc), RestrictInfo))
+ if (IsA(clause, RestrictInfo))
{
- RestrictInfo *rinfo = lfirst(lc);
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
clause = rinfo->clause;
if (rinfo->pseudoconstant &&
!DatumGetBool(((Const *) clause)->constvalue))
{
- *constfalse = true;
+ partclauses->constfalse = true;
return true;
}
}
- else
- clause = (Expr *) lfirst(lc);
/* Get the BoolExpr's out of the way.*/
if (IsA(clause, BoolExpr))
{
if (or_clause((Node *) clause))
{
- *or_clauses = lappend(*or_clauses, clause);
+ partclauses->or_clauses = lappend(partclauses->or_clauses, clause);
continue;
}
else if (and_clause((Node *) clause))
@@ -2097,31 +2090,33 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
{
Oid partopfamily = partkey->partopfamily[i];
AttrNumber partattno = partkey->partattrs[i];
+ Oid partcoll = partkey->partcollation[i];
Expr *partexpr = NULL;
PartClause *pc;
+ Oid commutator = InvalidOid;
/*
- * A non-zero partattno refers to a simple column reference that
- * will be matched against varattno of a Var appearing the clause.
- * partattno == 0 refers to arbitrary expressions, which get the
- * current one from PartitionKey.
+ * A zero attno means the partition key is an expression, so grab
+ * the next expression from the list.
*/
if (partattno == 0)
{
if (partexprs_item == NULL)
elog(ERROR, "wrong number of partition key expressions");
- /* Copy to avoid overwriting the relcache's content. */
- partexpr = copyObject(lfirst(partexprs_item));
+ partexpr = (Expr *) lfirst(partexprs_item);
/*
- * Expressions stored in PartitionKey in the relcache all
- * contain a dummy varno (that is, 1), but we must switch to
- * the RT index of the table in this query so that it can be
- * correctly matched to the expressions coming from the query.
+ * Expressions stored for the PartitionKey in the relcache are
+ * all stored with the dummy varno of 1. Change that to what
+ * we need.
*/
if (rt_index != 1)
+ {
+ /* make a copy so as not to overwrite the relcache */
+ partexpr = (Expr *) copyObject(partexpr);
ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+ }
partexprs_item = lnext(partexprs_item);
}
@@ -2140,15 +2135,43 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
rightop = (Expr *) get_rightop(clause);
if (IsA(rightop, RelabelType))
rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches the partition key */
if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
constexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ {
constexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ continue;
+ }
else
/* Clause does not match this partition key. */
continue;
/*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
* Handle cases where the clause's operator does not belong to
* the partitioning operator family. We currently handle two
* such cases: 1. Operators named '<>' are not listed in any
@@ -2190,30 +2213,23 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
continue;
}
- pc = palloc0(sizeof(PartClause));
+ pc = (PartClause *) palloc0(sizeof(PartClause));
pc->constarg = constexpr;
/*
- * Flip the left and right args if we have to, because the
- * code which extract the constant value to use for
- * partition-pruning expects to find it as the rightop of the
- * clause. (See below in this function.)
+ * If commutator is set to a valid Oid then we'll need to swap
+ * the left and right operands. Later code requires that the
+ * partkey is on the left side.
*/
- if (constexpr == rightop)
+ if (!OidIsValid(commutator))
pc->op = opclause;
else
{
OpExpr *commuted;
- Oid commutator = get_commutator(opclause->opno);
- /*
- * Caller must have made sure to check that the commutator
- * indeed exists.
- */
- Assert(OidIsValid(commutator));
commuted = (OpExpr *) copyObject(opclause);
commuted->opno = commutator;
- commuted->opfuncid = get_opcode(commuted->opno);
+ commuted->opfuncid = get_opcode(commutator);
commuted->args = list_make2(rightop, leftop);
pc->op = commuted;
}
@@ -2221,25 +2237,27 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
/*
* We don't turn a <> operator clause into a key right away.
* Instead, the caller will hand over such clauses to
- * get_partitions_excluded_by().
+ * get_partitions_excluded_by_ne_clauses().
*/
if (is_ne_listp)
- *ne_clauses = lappend(*ne_clauses, pc);
+ partclauses->ne_clauses = lappend(partclauses->ne_clauses,
+ pc);
else
{
- keyclauses_all[i] = lappend(keyclauses_all[i], pc);
- will_compute_keys = true;
+ partclauses->clauses[i] = lappend(partclauses->clauses[i], pc);
+ got_useful_keys = true;
/*
* Since we only allow strict operators, require keys to
* be not null.
*/
- if (bms_is_member(i, keyisnull))
+ if (bms_is_member(i, partclauses->keyisnull))
{
- *constfalse = true;
+ partclauses->constfalse = true;
return true;
}
- keyisnotnull = bms_add_member(keyisnotnull, i);
+ partclauses->keyisnotnull =
+ bms_add_member(partclauses->keyisnotnull, i);
}
}
else if (IsA(clause, ScalarArrayOpExpr))
@@ -2248,20 +2266,39 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
Oid saop_op = saop->opno;
Oid saop_opfuncid = saop->opfuncid;
Oid saop_coll = saop->inputcollid;
- Expr *leftop = linitial(saop->args),
- *rightop = lsecond(saop->args);
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
List *elem_exprs,
*elem_clauses;
ListCell *lc1;
bool negated = false;
- /* Clause does not match this partition key. */
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
continue;
/*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ continue;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
* In case of NOT IN (..), we get a '<>', which while not
* listed as part of any operator family, we are able to
* handle it if its negator is indeed a part of the
@@ -2343,10 +2380,10 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
elem_clauses = NIL;
foreach(lc1, elem_exprs)
{
- Const *rightop = castNode(Const, lfirst(lc1));
+ Expr *rightop = (Expr *) lfirst(lc1);
Expr *elem_clause;
- if (rightop->constisnull)
+ if (IsA(rightop, Const) && ((Const *) rightop)->constisnull)
{
NullTest *nulltest = makeNode(NullTest);
@@ -2380,7 +2417,7 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
* of the list that's being processed currently.
*/
if (saop->useOr)
- *or_clauses = lappend(*or_clauses,
+ partclauses->or_clauses = lappend(partclauses->or_clauses,
makeBoolExpr(OR_EXPR, elem_clauses,
-1));
else
@@ -2399,17 +2436,28 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
{
if (nulltest->nulltesttype == IS_NULL)
{
- if (bms_is_member(i, keyisnotnull))
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauses->keyisnotnull))
{
- *constfalse = true;
+ partclauses->constfalse = true;
return true;
}
- keyisnull = bms_add_member(keyisnull, i);
+ partclauses->keyisnull =
+ bms_add_member(partclauses->keyisnull, i);
}
else
- keyisnotnull = bms_add_member(keyisnotnull, i);
- n_keynullness++;
- will_compute_keys = true;
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauses->keyisnull))
+ {
+ partclauses->constfalse = true;
+ return true;
+ }
+
+ partclauses->keyisnotnull =
+ bms_add_member(partclauses->keyisnotnull, i);
+ }
+ got_useful_keys = true;
}
}
/*
@@ -2425,15 +2473,21 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
Expr *leftop,
*rightop;
- pc = palloc0(sizeof(PartClause));
+ pc = (PartClause *) palloc0(sizeof(PartClause));
if (IsA(clause, BooleanTest))
{
BooleanTest *btest = (BooleanTest *) clause;
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ continue;
+
leftop = btest->arg;
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
+
/* Clause does not match this partition key. */
if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
continue;
@@ -2450,6 +2504,7 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
: (Expr *) get_notclausearg((Expr *) clause);
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
+
/* Clause does not match this partition key. */
if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
continue;
@@ -2463,40 +2518,40 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
leftop, rightop,
InvalidOid, InvalidOid);
pc->constarg = rightop;
- keyclauses_all[i] = lappend(keyclauses_all[i], pc);
- will_compute_keys = true;
+ partclauses->clauses[i] = lappend(partclauses->clauses[i],
+ pc);
+ got_useful_keys = true;
}
}
}
- /* Return if no work to do below. */
- if (!will_compute_keys)
- return false;
+ return got_useful_keys;
+}
- /*
- * Try to eliminate redundant keys. In the process, we might find out
- * that clauses are mutually contradictory and hence can never be true
- * for any rows.
- */
- memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *));
- for (i = 0; i < partkey->partnatts; i++)
- {
- remove_redundant_clauses(partkey, i,
- keyclauses_all[i], &keyclauses[i],
- constfalse);
- if (*constfalse)
- return true;
- }
+/*
+ * extract_bounding_datums
+ * Process 'partclauses' and populate 'keys' with all min/max/equal values
+ * that we're able to determine.
+ *
+ * For RANGE partitioning we do not need to match all partition keys. We may
+ * be able to eliminate some partitions with just a prefix of the partition
+ * keys. HASH partitioning does require all keys are matched to with at least
+ * some combinations of equality clauses and IS NULL clauses. LIST partitions
+ * don't support multiple partition keys.
+ *
+ * Returns true if any keys were found during partition pruning.
+ */
+static bool
+extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
+ PartScanKeyInfo *keys)
+{
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
/*
- * Generate bounding tuple(s).
- *
- * Eventually, callers will use a function like partition_bound_bsearch()
- * to look up partitions from the clauses we matched against individual
- * partition key columns. Those function expect the lookup key to be in a
- * Datum array form, not a list-of-clauses form. So, we must construct the
- * lookup key(s) by extracting constant values out the clauses.
- *
* Based on the strategies of the clause operators (=, </<=, >/>=), try to
* construct tuples of those datums that serve as the exact look up tuple
* or tuples that serve as minimum and maximum bound. If we find datums
@@ -2516,6 +2571,8 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
memset(keys, 0, sizeof(PartScanKeyInfo));
for (i = 0; i < partkey->partnatts; i++)
{
+ List *clauselist = partclauses->clauses[i];
+
/*
* Min and max keys must constitute a prefix of the partition key and
* must appear in the same order as partition keys. Equal keys have
@@ -2531,9 +2588,9 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
if (i > keys->n_maxkeys)
need_next_max = false;
- foreach(lc, keyclauses[i])
+ foreach(lc, clauselist)
{
- PartClause *clause = lfirst(lc);
+ PartClause *clause = (PartClause *) lfirst(lc);
Expr *constarg = clause->constarg;
bool incl;
PartOpStrategy op_strategy;
@@ -2612,11 +2669,12 @@ classify_partition_bounding_keys(Relation relation, List *clauses,
keys->n_eqkeys = 0;
/* Finally, also set the keyisnull and keyisnotnull values. */
- keys->keyisnull = keyisnull;
- keys->keyisnotnull = keyisnotnull;
+ keys->keyisnull = partclauses->keyisnull;
+ keys->keyisnotnull = partclauses->keyisnotnull;
- return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
- keys->n_maxkeys > 0 || n_keynullness > 0);
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
}
/*
@@ -2732,235 +2790,264 @@ partkey_datum_from_expr(PartitionKey key, int partkeyidx,
}
/*
- * For a given partition key column, find the most restrictive of the clauses
- * contained in all_clauses that are known to match the column and add it to
- * *result.
+ * remove_redundant_clauses
+ * Collapse the 'partclauses' clauses lists to remove clause which are
+ * superseeded by a clause which is more restrictive.
+ *
+ * Here we perform further processing on 'partclauses' to remove any redundant
+ * clauses. We also look for clauses which contradict one another in a way
+ * that proves the 'partclauses' cannot possibly match any partition.
+ * Impossible clauses include things like: x = 1 AND x = 2, x > 0 and x < 10
*
- * If it is found that two clauses are mutually contradictory, *constfalse
- * is set to true before returning.
+ * We also transform 'partclauses' into the minimum set of clauses by removing
+ * any clauses which are made redundant by a more restrictive clause. For
+ * example, x > 1 AND x > 2 and x >= 5, the latter is the most restrictive so
+ * 5 is the best minimum bound. The operator here also happens to be
+ * inclusive.
+ *
+ * If we find that two clauses contradict each other then 'partclauses'
+ * constfalse is set to true to alert the caller that nothing can match.
*/
static void
-remove_redundant_clauses(PartitionKey partkey, int partkeyidx,
- List *all_clauses, List **result,
- bool *constfalse)
+remove_redundant_clauses(PartitionKey partkey,
+ PartScanClauseInfo *partclauses)
{
PartClause *hash_clause,
*btree_clauses[BTMaxStrategyNumber];
ListCell *lc;
int s;
+ int i;
bool test_result;
+ List *newlist;
- *result = NIL;
-
- hash_clause = NULL;
- memset(btree_clauses, 0, sizeof(btree_clauses));
- foreach(lc, all_clauses)
+ for (i = 0; i < partkey->partnatts; i++)
{
- PartClause *cur = lfirst(lc);
+ List *all_clauses = partclauses->clauses[i];
- if (!cur->valid_cache)
- {
- Oid lefttype;
-
- get_op_opfamily_properties(cur->op->opno,
- partkey->partopfamily[partkeyidx],
- false,
- &cur->op_strategy,
- &lefttype,
- &cur->op_subtype);
- fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
- cur->valid_cache = true;
- }
+ hash_clause = NULL;
+ newlist = NIL;
- /*
- * Hash-partitioning knows only about equality. So, if we've matched
- * a clause and found another clause whose constant operand doesn't
- * match the constant operand of the former, then we have found
- * mutually contradictory clauses.
- */
- if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, all_clauses)
{
- if (hash_clause == NULL)
- hash_clause = cur;
- /* check if another clause would contradict the one we have */
- else if (partition_cmp_args(partkey, partkeyidx,
- cur, cur, hash_clause,
- &test_result))
+ PartClause *cur = (PartClause *) lfirst(lc);
+
+ if (!cur->valid_cache)
{
- if (!test_result)
- {
- *constfalse = true;
- return;
- }
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[i],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
}
- /*
- * Couldn't compare; keep hash_clause set to the previous value,
- * and add this one directly to the result. Caller would
- * arbitrarily choose one of the many and perform
- * partition-pruning with it.
- */
- else
- *result = lappend(*result, cur);
/*
- * The code below handles btree operators, so not relevant for
- * hash partitioning.
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
*/
- continue;
- }
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, i,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauses->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ newlist = lappend(newlist, cur);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
- /*
- * The code that follows closely mimics similar processing done by
- * nbtutils.c: _bt_preprocess_keys().
- *
- * btree_clauses[s] points currently best clause containing the
- * operator strategy type s+1; it is NULL if we haven't yet found
- * such a clause.
- */
- s = cur->op_strategy - 1;
- if (btree_clauses[s] == NULL)
- {
- btree_clauses[s] = cur;
- }
- else
- {
/*
- * Is this one more restrictive than what we already have?
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
*
- * Consider some examples: 1. If btree_clauses[BTLT] now contains
- * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
- * currently at btree_clauses[BTLT] will be replaced by a < 3.
- *
- * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
- * then because 5 = 7 is false, we found a mutual contradiction,
- * so we set *constfalse to true and return.
- *
- * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
- * then because 7 < 5 is false, we leave a < 5 where it is and
- * effectively discard a < 7 as being redundant.
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
*/
- if (partition_cmp_args(partkey, partkeyidx,
- cur, cur, btree_clauses[s],
- &test_result))
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
{
- /* cur is more restrictive, so replace the existing. */
- if (test_result)
- btree_clauses[s] = cur;
- else if (s == BTEqualStrategyNumber - 1)
- {
- *constfalse = true;
- return;
- }
-
- /* Old one is more restrictive, so keep around. */
+ btree_clauses[s] = cur;
}
else
{
/*
- * we couldn't determine which one is more restrictive. Keep
- * the previous one in btree_clauses[s] and push this one directly
- * to the output list.
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
*/
- *result = lappend(*result, cur);
+ if (partition_cmp_args(partkey, i,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauses->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ newlist = lappend(newlist, cur);
+ }
}
}
- }
-
- if (partkey->strategy == PARTITION_STRATEGY_HASH)
- {
- /* Note we didn't add this one to the result yet. */
- if (hash_clause)
- *result = lappend(*result, hash_clause);
- return;
- }
- /* Compare btree operator clauses across strategies. */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ newlist = lappend(newlist, hash_clause);
+ list_free(partclauses->clauses[i]);
+ partclauses->clauses[i] = newlist;
+ continue;
+ }
- /* Compare the equality clause with clauses of other strategies. */
- if (btree_clauses[BTEqualStrategyNumber - 1])
- {
- PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+ /* Compare btree operator clauses across strategies. */
- for (s = 0; s < BTMaxStrategyNumber; s++)
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
{
- PartClause *chk = btree_clauses[s];
-
- if (!chk || s == (BTEqualStrategyNumber - 1))
- continue;
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
- /*
- * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
- * is a = 5, then because 5 < 5 is false, we found contradiction.
- * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
- * eq clause is a = 3, then because 3 < 5, we no longer need
- * a < 5, because a = 3 is more restrictive.
- */
- if (partition_cmp_args(partkey, partkeyidx,
- chk, eq, chk,
- &test_result))
+ for (s = 0; s < BTMaxStrategyNumber; s++)
{
- if (!test_result)
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, i,
+ chk, eq, chk,
+ &test_result))
{
- *constfalse = true;
- return;
+ if (!test_result)
+ {
+ partclauses->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
}
- /* Discard the no longer needed clause. */
- btree_clauses[s] = NULL;
}
}
- }
-
- /*
- * Try to keep only one of <, <=.
- *
- * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
- * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
- * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
- * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
- * redundant.
- */
- if (btree_clauses[BTLessStrategyNumber - 1] &&
- btree_clauses[BTLessEqualStrategyNumber - 1])
- {
- PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
- *le = btree_clauses[BTLessEqualStrategyNumber - 1];
- if (partition_cmp_args(partkey, partkeyidx,
- le, lt, le,
- &test_result))
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
{
- if (test_result)
- btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
- else
- btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
}
- }
- /* Try to keep only one of >, >=. See the example above. */
- if (btree_clauses[BTGreaterStrategyNumber - 1] &&
- btree_clauses[BTGreaterEqualStrategyNumber - 1])
- {
- PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
- *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
- if (partition_cmp_args(partkey, partkeyidx,
- ge, gt, ge,
- &test_result))
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
{
- if (test_result)
- btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
- else
- btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ if (btree_clauses[s])
+ newlist = lappend(newlist, btree_clauses[s]);
}
- }
- /*
- * btree_clauses now contains the "best" clause or NULL for each btree
- * strategy number. Add to the result.
- */
- for (s = 0; s < BTMaxStrategyNumber; s++)
- if (btree_clauses[s])
- *result = lappend(*result, btree_clauses[s]);
+ /*
+ * Replace the old List with the new one with the redundant clauses
+ * removed.
+ */
+ list_free(partclauses->clauses[i]);
+ partclauses->clauses[i] = newlist;
+ }
}
/*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e7c7a6e..51648c8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -858,54 +858,24 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
/*
* get_append_rel_partitions
- * Return the list of partitions of rel that pass the clauses mentioned
- * in rel->baserestrictinfo. An empty list is returned if no matching
- * partitions were found.
- *
- * Returned list contains the AppendRelInfos of chosen partitions.
+ * Returns a List of AppendRelInfo belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
*/
static List *
get_append_rel_partitions(PlannerInfo *root,
RelOptInfo *rel,
RangeTblEntry *rte)
{
- List *partclauses;
- bool contains_const,
- constfalse;
- List *result = NIL;
- int i;
- Relation parent;
- PartitionDesc partdesc;
+ Relation partrel;
Bitmapset *partindexes;
+ List *result = NIL;
+ int i;
- /*
- * Get the clauses that match the partition key. It's also a good idea
- * to check if the matched clauses contain constant values that can be
- * used for pruning and go to get_partitions_from_clauses() only if so.
- * If rel->baserestrictinfo might contain mutually contradictory clauses,
- * also find out about that.
- */
- partclauses = match_clauses_to_partkey(root, rel, rel->baserestrictinfo,
- &contains_const, &constfalse);
+ partrel = heap_open(rte->relid, NoLock);
- /* We're done here. */
- if (constfalse)
- return NIL;
-
- parent = heap_open(rte->relid, NoLock);
- partdesc = RelationGetPartitionDesc(parent);
-
- if (partclauses != NIL && contains_const)
- partindexes = get_partitions_from_clauses(parent, rel->relid,
- partclauses);
- else
- {
- /*
- * There are no clauses that are useful to prune any partitions, so
- * scan all partitions.
- */
- partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);
- }
+ partindexes = get_partitions_from_clauses(partrel, rel->relid,
+ rel->baserestrictinfo);
/* Fetch the partition appinfos. */
i = -1;
@@ -914,328 +884,22 @@ get_append_rel_partitions(PlannerInfo *root,
AppendRelInfo *appinfo = rel->part_appinfos[i];
#ifdef USE_ASSERT_CHECKING
- RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);
+ PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
+ RangeTblEntry *childrte;
+
+ childrte = planner_rt_fetch(appinfo->child_relid, root);
/*
* Must be the intended child's RTE here, because appinfos are ordered
* the same way as partitions in the partition descriptor.
*/
- Assert(partdesc->oids[i] == rte->relid);
+ Assert(partdesc->oids[i] == childrte->relid);
#endif
+
result = lappend(result, appinfo);
}
- heap_close(parent, NoLock);
-
- return result;
-}
-
-#define PartCollMatchesExprColl(partcoll, exprcoll) \
- ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
-
-/*
- * match_clauses_to_partkey
- * Match clauses with rel's partition key
- *
- * Returned list contains clauses matched to the partition key columns and
- * *contains_const and *constfalse are set as described below.
- *
- * For an individual clause to match with a partition key column, the clause
- * must be an operator clause of the form (partkey op const) or (const op
- * partkey); the latter only if a suitable commutator exists. Furthermore,
- * the operator must be strict and its input collation must match the partition
- * collation. The aforementioned "const" means any expression that doesn't
- * involve a volatile function or a Var of this relation. We allow Vars
- * belonging to other relations (for example, if the clause is a join clause),
- * but they are treated as parameters whose values are not known now, so cannot
- * be used for partition pruning right within the planner. It's the
- * responsibility of higher code levels to manage restriction and join clauses
- * appropriately. If a NullTest against a partition key is encountered, it's
- * added to the result as well.
- *
- * *contains_const is set if at least one matched clauses contains the constant
- * operand or is a Nullness test. *constfalse is set if the input list
- * contains a pseudo-constant RestrictInfo with false value.
- */
-static List *
-match_clauses_to_partkey(PlannerInfo *root,
- RelOptInfo *rel,
- List *clauses,
- bool *contains_const,
- bool *constfalse)
-{
- PartitionScheme partscheme = rel->part_scheme;
- List *result = NIL;
- ListCell *lc;
-
- *contains_const = false;
- *constfalse = false;
-
- Assert (partscheme != NULL);
-
- /* Make a copy, because we may scribble on it below. */
- clauses = list_copy(clauses);
-
- foreach(lc, clauses)
- {
- Node *member = lfirst(lc);
- Expr *clause;
- int i;
-
- if (IsA(member, RestrictInfo))
- {
- RestrictInfo *rinfo = (RestrictInfo *) member;
-
- clause = rinfo->clause;
- if (rinfo->pseudoconstant &&
- (IsA(clause, Const) &&
- ((((Const *) clause)->constisnull) ||
- !DatumGetBool(((Const *) clause)->constvalue))))
- {
- *constfalse = true;
- return NIL;
- }
- }
- else
- clause = (Expr *) member;
-
- /*
- * For a BoolExpr, we should try to match each of its args with the
- * partition key as described below for each type.
- */
- if (IsA(clause, BoolExpr))
- {
- if (or_clause((Node *) clause))
- {
- /*
- * For each of OR clause's args, call this function
- * recursively with a given arg as the only member in the
- * input list and see if it's returned as matching the
- * partition key. Add the OR clause to the result iff at
- * least one of its args contain a matching clause.
- */
- BoolExpr *orclause = (BoolExpr *) clause;
- ListCell *lc1;
- bool arg_matches_key = false,
- matched_arg_contains_const = false,
- all_args_constfalse = true;
-
- foreach (lc1, orclause->args)
- {
- Node *arg = lfirst(lc1);
- bool contains_const1,
- constfalse1;
-
- if (match_clauses_to_partkey(root, rel, list_make1(arg),
- &contains_const1,
- &constfalse1) != NIL)
- {
- arg_matches_key = true;
- matched_arg_contains_const = contains_const1;
- }
-
- /* We got at least one arg that is not constant false. */
- if (!constfalse1)
- all_args_constfalse = false;
- }
-
- if (arg_matches_key)
- {
- result = lappend(result, clause);
- *contains_const = matched_arg_contains_const;
- }
-
- /* OR clause is "constant false" if all of its args are. */
- *constfalse = all_args_constfalse;
- continue;
- }
- else if (and_clause((Node *) clause))
- {
- /*
- * Since the clause is itself implicitly ANDed with other
- * clauses in the input list, queue the args to be processed
- * later as if they were part of the original input list.
- */
- clauses = list_concat(clauses,
- list_copy(((BoolExpr *) clause)->args));
- continue;
- }
-
- /* Fall-through for a NOT clause, which is handled below. */
- }
-
- for (i = 0; i < partscheme->partnatts; i++)
- {
- Node *partkey = linitial(rel->partexprs[i]);
- Oid partopfamily = partscheme->partopfamily[i],
- partcoll = partscheme->partcollation[i];
-
- /*
- * Check if the clauses matches the partition key and add it to
- * the result list if other things such as operator input
- * collation, strictness, etc. look fine.
- */
- if (is_opclause(clause))
- {
- Expr *constexpr,
- *leftop,
- *rightop;
- Relids constrelids;
- Oid expr_op,
- expr_coll;
-
- leftop = (Expr *) get_leftop(clause);
- rightop = (Expr *) get_rightop(clause);
- expr_op = ((OpExpr *) clause)->opno;
- expr_coll = ((OpExpr *) clause)->inputcollid;
-
- if (IsA(leftop, RelabelType))
- leftop = ((RelabelType *) leftop)->arg;
- if (IsA(rightop, RelabelType))
- rightop = ((RelabelType *) rightop)->arg;
-
- if (equal(leftop, partkey))
- {
- constexpr = rightop;
- constrelids = pull_varnos((Node *) rightop);
- }
- else if (equal(rightop, partkey))
- {
- constexpr = leftop;
- constrelids = pull_varnos((Node *) leftop);
- expr_op = get_commutator(expr_op);
-
- /*
- * If no commutator exists, cannot flip the qual's args,
- * so give up.
- */
- if (!OidIsValid(expr_op))
- continue;
- }
- else
- /* Neither argument matches the partition key. */
- continue;
-
- /*
- * Useless if what we're thinking of as a constant is actually
- * a Var coming from this relation.
- */
- if (bms_is_member(rel->relid, constrelids))
- continue;
-
- /*
- * Also, useless, if the clause's collation is different from
- * the partitioning collation.
- */
- if (!PartCollMatchesExprColl(partcoll, expr_coll))
- continue;
-
- /*
- * Only allow strict operators to think sanely about the
- * behavior with null arguments.
- */
- if (!op_strict(expr_op))
- continue;
-
- /* Useless if the "constant" can change its value. */
- if (contain_volatile_functions((Node *) constexpr))
- continue;
-
- /*
- * Everything seems to be fine, so add it to the list of
- * clauses we will use for pruning.
- */
- result = lappend(result, clause);
-
- if (!*contains_const)
- *contains_const = IsA(constexpr, Const);
- }
- else if (IsA(clause, ScalarArrayOpExpr))
- {
- ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
- Oid saop_op = saop->opno;
- Oid saop_coll = saop->inputcollid;
- Node *leftop = (Node *) linitial(saop->args),
- *rightop = (Node *) lsecond(saop->args);
-
- if (IsA(leftop, RelabelType))
- leftop = (Node *) ((RelabelType *) leftop)->arg;
- if (!equal(leftop, partkey))
- continue;
-
- /* Check if saop_op is compatible with partitioning. */
- if (!op_strict(saop_op))
- continue;
-
- /* Useless if the "constant" can change its value. */
- if (contain_volatile_functions((Node *) rightop))
- continue;
-
- /*
- * Also, useless, if the clause's collation is different from
- * the partitioning collation.
- */
- if (!PartCollMatchesExprColl(partcoll, saop_coll))
- continue;
-
- /* OK to add to the result. */
- result = lappend(result, clause);
- if (IsA(eval_const_expressions(root, rightop), Const))
- *contains_const = true;
- else
- *contains_const = false;
- }
- else if (IsA(clause, NullTest))
- {
- NullTest *nulltest = (NullTest *) clause;
- Node *arg = (Node *) nulltest->arg;
-
- if (equal(arg, partkey))
- {
- result = lappend(result, nulltest);
- /* A Nullness test can be used right away. */
- *contains_const = true;
- }
- }
- /*
- * Certain Boolean conditions have a special shape, which we
- * accept if the partitioning opfamily accepts Boolean conditions.
- */
- else if (IsBooleanOpfamily(partopfamily) &&
- (IsA(clause, BooleanTest) ||
- IsA(clause, Var) || not_clause((Node *) clause)))
- {
- /*
- * Only accept those for pruning that appear to be
- * IS [NOT] TRUE/FALSE.
- */
- if (IsA(clause, BooleanTest))
- {
- BooleanTest *btest = (BooleanTest *) clause;
- Expr *arg = btest->arg;
-
- if (btest->booltesttype != IS_UNKNOWN &&
- btest->booltesttype != IS_NOT_UNKNOWN &&
- equal((Node *) arg, partkey))
- result = lappend(result, clause);
- }
- else if (IsA(clause, Var))
- {
- if (equal((Node *) clause, partkey))
- result = lappend(result, clause);
- }
- else
- {
- Node *arg = (Node *) get_notclausearg((Expr *) clause);
-
- if (equal(arg, partkey))
- result = lappend(result, clause);
- }
-
- *contains_const = true;
- }
- }
- }
+ heap_close(partrel, NoLock);
return result;
}
Hi David.
On 2018/01/18 12:14, David Rowley wrote:
On 18 January 2018 at 00:13, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 17 January 2018 at 23:48, Amit Langote <amitlangote09@gmail.com> wrote:
I'm concerned that after your patch to remove
match_clauses_to_partkey(), we'd be doing more work than necessary in
some cases. For example, consider the case of using run-time pruning
for nested loop where the inner relation is a partitioned table. With
the old approach, get_partitions_from_clauses() would only be handed
the clauses that are known to match the partition keys (which most
likely is fewer than all of the query's clauses), so
get_partitions_from_clauses() doesn't have to do the work of filtering
non-partition clauses every time (that is, for every outer row).
That's why I had decided to keep that part in the planner.That might be better served by splitting
classify_partition_bounding_keys() into separate functions, the first
function would be in charge of building keyclauses_all. That way the
remaining work during the executor would never need to match clauses
to a partition key as they'd be in lists dedicated to each key.I've attached another delta against your v20 patch which does this.
It's very rough for now and I've only checked that it passes the
regression test so far.
Thanks!
It will need some cleanup work, but I'd be keen to know what you think
of the general idea.
This one looks in a much better shape.
I've not fully worked out how run-time pruning
will use this as it'll need another version of
get_partitions_from_clauses but passes in a PartScanClauseInfo
instead, and does not call extract_partition_key_clauses. That area
probably needs some shuffling around so that does not end up a big
copy and paste of all that new logic.
So, I've been assuming that the planner changes in the run-time pruning
patch have to do with selecting clauses (restriction clauses not
containing Consts and/or join clauses) to be passed to the executor by
recording them in the Append node. Will they be selected by the planner
calling into partition.c?
Meanwhile, here is a revised version (v21) that incorporates your changes.
I added you as the author in 0002 and 0005 patches. I guess a v22 will
have to follow very soon...
Thanks,
Amit
Attachments:
v21-0001-Some-interface-changes-for-partition_bound_-cmp-.patchtext/plain; charset=UTF-8; name=v21-0001-Some-interface-changes-for-partition_bound_-cmp-.patchDownload
From 27a8069d5f7de8a888a8f083b6448e6e5d86a368 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH v21 1/5] Some interface changes for
partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 166 +++++++++++++++++++++++++++++-----------
1 file changed, 123 insertions(+), 43 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8adc4ee977..1edbf66eae 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (caller should set is_bound to true and set bound), or a new tuple's
+ * partition key specified in datums (caller should set ndatums to the number
+ * of valid datums that are passed in the array).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -985,6 +1011,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -999,8 +1027,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1073,10 +1107,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1127,6 +1167,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1148,8 +1189,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1163,9 +1207,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2537,12 +2581,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2569,11 +2616,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2845,12 +2896,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2872,11 +2923,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2885,25 +2936,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If there's no tuple datum to compare with the bound,
+ * consider the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2914,12 +2995,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2933,20 +3015,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg may contain either a partition bound struct or a Datum array
+ * representing the partition key of a tuple being routed. We simply pass
+ * that down to partition_bound_cmp where it is interpreted appropriately.
*
- * *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *is_equal is set to whether the bound at the returned index is exactly
+ * equal to *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2959,8 +3040,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
v21-0002-Introduce-a-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v21-0002-Introduce-a-get_partitions_from_clauses.patchDownload
From 184a5457666ba5636cc5f41da7c56537ee97b453 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v21 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of query clauses and
return a set of indexes of the partitions that satisfy all of those
clauses.
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Callers must have checked that each of the clauses matches one of the
partition keys.
Authors: Amit Langote, David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/catalog/partition.c | 2077 +++++++++++++++++++++++++++++++++-
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 3 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/optimizer/clauses.h | 2 +
5 files changed, 2085 insertions(+), 4 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1edbf66eae..b35e35cfb6 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -151,7 +155,6 @@ typedef struct PartitionRangeBound
*/
typedef struct PartitionBoundCmpArg
{
- bool is_bound;
union
{
PartitionListValue *lbound;
@@ -161,8 +164,101 @@ typedef struct PartitionBoundCmpArg
Datum *datums;
int ndatums;
+ bool is_bound;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ OpExpr *op;
+ Expr *constarg;
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+/*
+ * Clauses matched to partition keys
+ */
+typedef struct PartScanClauseInfo
+{
+ /* Lists of clauses indexed by partition key */
+ List *clauses[PARTITION_MAX_KEYS];
+
+ List *or_clauses; /* List of clauses found in an OR branch */
+ List *ne_clauses; /* Clauses in the form partkey <> Expr */
+
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* Stored data is known to contain impossible contradictions */
+ bool constfalse;
+} PartScanClauseInfo;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Properties found are cached and are indexed by the
+ * partition key index.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses found for the corresponding partition
+ * are inclusive of the stored value or not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +307,35 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
+ int rt_index, List *clauses);
+static Bitmapset *get_partitions_excluded_by_ne_clauses(Relation relation,
+ List *ne_clauses);
+static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
+ int rt_index, List *or_clause_args);
+static bool extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index, PartScanClauseInfo *partclauses);
+static bool extract_bounding_datums(PartitionKey partkey,
+ PartScanClauseInfo *partclauses,
+ PartScanKeyInfo *keys);
+static bool partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *op,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value);
+static void remove_redundant_clauses(PartitionKey partkey,
+ PartScanClauseInfo *partclauses);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1581,9 +1706,1959 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_from_clauses
+ * Determine all partitions of 'relation' that could possibly contain a
+ * record that matches 'partclauses'
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo;
+ List *clauses;
+
+ /* All partitions match if there are no clauses */
+ if (!partclauses)
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ /* Some functions called below modify this list */
+ clauses = list_copy(partclauses);
+ boundinfo = partdesc->boundinfo;
+
+ /*
+ * If relation is a sub-partitioned table, add its partition constraint
+ * clauses to the list of clauses to use for partition pruning. This
+ * is done to facilitate correct decision regarding the default
+ * partition. Adding the partition constraint clauses to the list helps
+ * restrict the possible key space to only that allowed by the partition
+ * and thus avoids the default partition being inadvertently added to the
+ * set of selected partitions for a query whose clauses select a key space
+ * bigger than the partition's.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ List *partqual = RelationGetPartitionQual(relation);
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partqual, 1, rt_index, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ return get_partitions_from_clauses_recurse(relation, rt_index, clauses);
+}
+
/* Module-local functions */
/*
+ * get_partitions_from_clauses_recurse
+ * Determine relation's partitions that satisfy *all* of the clauses
+ * in the list
+ *
+ * Return value is a Bitmapset containing the indexes of selected partitions.
+ */
+static Bitmapset *
+get_partitions_from_clauses_recurse(Relation relation, int rt_index,
+ List *clauses)
+{
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartScanClauseInfo partclauses;
+ Bitmapset *result;
+ PartScanKeyInfo keys;
+ ListCell *lc;
+
+ /* Populate partclauses from the clause list */
+ if (extract_partition_key_clauses(partkey, clauses, rt_index, &partclauses))
+ {
+ /*
+ * No partitions to scan if extract_partition_key_clauses found some
+ * clause contradiction.
+ */
+ if (partclauses.constfalse)
+ return NULL;
+
+ /* collapse clauses down to the most restrictive set */
+ remove_redundant_clauses(partkey, &partclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauses.constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(partkey, &partclauses, &keys))
+ {
+ result = get_partitions_for_keys(relation, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we got
+ * an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * We found nothing useful to indicate which partitions might need to
+ * be scanned. Perhaps we'll find something below that indicates
+ * which ones won't need to be scanned.
+ */
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+ }
+ else
+ {
+ /*
+ * no useful key clauses found, but we might still be able to
+ * eliminate some partitions with ne_clauses or or_clauses.
+ */
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ /* Select partitions by applying the clauses containing <> operators. */
+ if (partclauses.ne_clauses)
+ {
+ Bitmapset *ne_clause_parts;
+
+ ne_clause_parts = get_partitions_excluded_by_ne_clauses(relation,
+ partclauses.ne_clauses);
+
+ /* Remove any partitions we found to not be needed */
+ result = bms_del_members(result, ne_clause_parts);
+ bms_free(ne_clause_parts);
+ }
+
+ /* Select partitions by applying OR clauses. */
+ foreach(lc, partclauses.or_clauses)
+ {
+ BoolExpr *or = (BoolExpr *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_clause_args(relation, rt_index,
+ or->args);
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_clauses
+ *
+ * Returns a Bitmapset of partition indexes of any partition that can safely
+ * be removed due to 'ne_clauses' containing not-equal clauses for all
+ * possible values that the partition can contain.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_clauses(Relation relation, List *ne_clauses)
+{
+ ListCell *lc;
+ Bitmapset *excluded_parts = NULL;
+ Bitmapset *foundoffsets = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int *datums_in_part;
+ int *datums_found;
+ int i;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partkey->partnatts == 1);
+
+ memset(&arg, 0, sizeof(arg));
+
+ /*
+ * Build a Bitmapset to record the indexes of all datums of the
+ * query that are found in boundinfo.
+ */
+ foreach(lc, ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(partkey, 0, pc->constarg, &datum))
+ {
+ int offset;
+ bool is_equal;
+
+ arg.datums = &datum;
+ arg.ndatums = 1;
+ offset = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums we found in the query, and another to record the number of
+ * datums permitted in each partition. Once we've counted all this, we
+ * can eliminate any partition where the number of datums found matches
+ * the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * partdesc->nparts);
+ datums_found = (int *) palloc0(sizeof(int) * partdesc->nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ for (i = 0; i < partdesc->nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
+ * get_partitions_from_or_clause_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_clause_args.
+ */
+static Bitmapset *
+get_partitions_from_or_clause_args(Relation relation, int rt_index,
+ List *or_clause_args)
+{
+ ListCell *lc;
+ Bitmapset *result = NULL;
+
+ foreach(lc, or_clause_args)
+ {
+ List *arg_clauses = list_make1(lfirst(lc));
+ List *partconstr = RelationGetPartitionQual(relation);
+ Bitmapset *arg_partset;
+
+ /*
+ * It's possible that this clause is never true for this relation due
+ * to it contradicting the partition's constraint. In this case we
+ * must not include any partitions for this OR clause. However, this
+ * OR clause may not contain any quals matching this partition table's
+ * partition key, it may contain some belonging to a parent partition
+ * though, so we may not have all the quals here required to make use
+ * of get_partitions_from_clauses_recurse to determine the correct set
+ * of partitions, so we'll just make use of predicate_refuted_by
+ * instead.
+ */
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
+ if (predicate_refuted_by(partconstr, arg_clauses, false))
+ continue;
+ }
+
+ arg_partset = get_partitions_from_clauses_recurse(relation, rt_index,
+ arg_clauses);
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ? \
+ (IsA((expr), Var) && \
+ ((Var *) (expr))->varattno == (partattno)) : \
+ equal((expr), (partexpr)))
+
+#define COLLATION_MATCH(partcoll, exprcoll) \
+ (!OidIsValid(partcoll) || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_key_clauses
+ * Process 'clauses' to extract clause matching the partition key.
+ * This populates 'partclauses' with the set of clauses matching each
+ * key also also collects other useful clauses to assist in partition
+ * elimination, such as or clauses and not equal clauses. We also record
+ * which partitions keys we can prove are NULL or NOT NULL.
+ *
+ * We may also discover some contradition in the clauses which means that no
+ * partition can possibly match. In this case the function sets partclauses's
+ * 'constfalse' to true and returns true. In this case the caller should not
+ * assume the clauses have been fully processed as we abort as soon as we find
+ * a contradicting condition.
+ *
+ * The function returns false if no useful key clauses were found.
+ */
+static bool
+extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index,
+ PartScanClauseInfo *partclauses)
+{
+ int i;
+ ListCell *lc;
+ bool got_useful_keys = false;
+
+ memset(partclauses, 0, sizeof(PartScanClauseInfo));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ ListCell *partexprs_item;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauses->constfalse = true;
+ return true;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauses->or_clauses = lappend(partclauses->or_clauses, clause);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ Oid partopfamily = partkey->partopfamily[i];
+ AttrNumber partattno = partkey->partattrs[i];
+ Oid partcoll = partkey->partcollation[i];
+ Expr *partexpr = NULL;
+ PartClause *pc;
+ Oid commutator = InvalidOid;
+
+ /*
+ * A zero attno means the partition key is an expression, so grab
+ * the next expression from the list.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ partexpr = (Expr *) lfirst(partexprs_item);
+
+ /*
+ * Expressions stored for the PartitionKey in the relcache are
+ * all stored with the dummy varno of 1. Change that to what
+ * we need.
+ */
+ if (rt_index != 1)
+ {
+ /* make a copy so as not to overwrite the relcache */
+ partexpr = (Expr *) copyObject(partexpr);
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+ }
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *constexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches the partition key */
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ constexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ {
+ constexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ continue;
+ }
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ continue;
+
+ /* Useless if the "constant" can change its value. */
+ if (contain_volatile_functions((Node *) constexpr))
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering opertors like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ int strategy;
+ Oid negator,
+ lefttype,
+ righttype;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber &&
+ partkey->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->constarg = constexpr;
+
+ /*
+ * If commutator is set to a valid Oid then we'll need to swap
+ * the left and right operands. Later code requires that the
+ * partkey is on the left side.
+ */
+ if (!OidIsValid(commutator))
+ pc->op = opclause;
+ else
+ {
+ OpExpr *commuted;
+
+ commuted = (OpExpr *) copyObject(opclause);
+ commuted->opno = commutator;
+ commuted->opfuncid = get_opcode(commutator);
+ commuted->args = list_make2(rightop, leftop);
+ pc->op = commuted;
+ }
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauses->ne_clauses = lappend(partclauses->ne_clauses,
+ pc);
+ else
+ {
+ partclauses->clauses[i] = lappend(partclauses->clauses[i], pc);
+ got_useful_keys = true;
+
+ /*
+ * Since we only allow strict operators, require keys to
+ * be not null.
+ */
+ if (bms_is_member(i, partclauses->keyisnull))
+ {
+ partclauses->constfalse = true;
+ return true;
+ }
+ partclauses->keyisnotnull =
+ bms_add_member(partclauses->keyisnotnull, i);
+ }
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ continue;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle it if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1);
+ Expr *elem_clause;
+
+ if (IsA(rightop, Const) && ((Const *) rightop)->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ partclauses->or_clauses = lappend(partclauses->or_clauses,
+ makeBoolExpr(OR_EXPR, elem_clauses,
+ -1));
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (EXPR_MATCHES_PARTKEY(arg, partattno, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauses->keyisnotnull))
+ {
+ partclauses->constfalse = true;
+ return true;
+ }
+ partclauses->keyisnull =
+ bms_add_member(partclauses->keyisnull, i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauses->keyisnull))
+ {
+ partclauses->constfalse = true;
+ return true;
+ }
+
+ partclauses->keyisnotnull =
+ bms_add_member(partclauses->keyisnotnull, i);
+ }
+ got_useful_keys = true;
+ }
+ }
+ /*
+ * Boolean conditions have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ continue;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
+ BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, InvalidOid);
+ pc->constarg = rightop;
+ partclauses->clauses[i] = lappend(partclauses->clauses[i],
+ pc);
+ got_useful_keys = true;
+ }
+ }
+ }
+
+ return got_useful_keys;
+}
+
+/*
+ * extract_bounding_datums
+ * Process 'partclauses' and populate 'keys' with all min/max/equal values
+ * that we're able to determine.
+ *
+ * For RANGE partitioning we do not need to match all partition keys. We may
+ * be able to eliminate some partitions with just a prefix of the partition
+ * keys. HASH partitioning does require all keys are matched to with at least
+ * some combinations of equality clauses and IS NULL clauses. LIST partitions
+ * don't support multiple partition keys.
+ *
+ * Returns true if any keys were found during partition pruning.
+ */
+static bool
+extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
+ PartScanKeyInfo *keys)
+{
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clause operators (=, </<=, >/>=), try to
+ * construct tuples of those datums that serve as the exact look up tuple
+ * or tuples that serve as minimum and maximum bound. If we find datums
+ * for all partition key columns that appear in = operator clauses, then
+ * we have the exact match look up tuple, which will be used to match just
+ * one partition. If the last datum in a tuple comes from a clause
+ * containing </<= or >/>= operator, then that constitutes the minimum
+ * or maximum bound tuple, respectively. There is one exception -- if
+ * we have a tuple containing values for only a prefix of partition key
+ * columns, where none of its values come from a </<= or >/>= operator
+ * clause, we still consider such tuple as both the minimum and maximum
+ * bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ List *clauselist = partclauses->clauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *constarg = clause->constarg;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, constarg,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing equality
+ * operators for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = partclauses->keyisnull;
+ keys->keyisnotnull = partclauses->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'op' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+{
+ PartOpStrategy result;
+
+ *incl = false; /* overwritten as appropriate below */
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (op->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ result = PART_OP_EQUAL;
+ }
+ break;
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (op->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ result = PART_OP_LESS;
+ break;
+ case BTEqualStrategyNumber:
+ *incl = true;
+ result = PART_OP_EQUAL;
+ break;
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ result = PART_OP_GREATER;
+ break;
+ }
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partkeyidx])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partkeyidx], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * remove_redundant_clauses
+ * Collapse the 'partclauses' clauses lists to remove clause which are
+ * superseeded by a clause which is more restrictive.
+ *
+ * Here we perform further processing on 'partclauses' to remove any redundant
+ * clauses. We also look for clauses which contradict one another in a way
+ * that proves the 'partclauses' cannot possibly match any partition.
+ * Impossible clauses include things like: x = 1 AND x = 2, x > 0 and x < 10
+ *
+ * We also transform 'partclauses' into the minimum set of clauses by removing
+ * any clauses which are made redundant by a more restrictive clause. For
+ * example, x > 1 AND x > 2 and x >= 5, the latter is the most restrictive so
+ * 5 is the best minimum bound. The operator here also happens to be
+ * inclusive.
+ *
+ * If we find that two clauses contradict each other then 'partclauses'
+ * constfalse is set to true to alert the caller that nothing can match.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ PartScanClauseInfo *partclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+ List *newlist;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ List *all_clauses = partclauses->clauses[i];
+
+ hash_clause = NULL;
+ newlist = NIL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, all_clauses)
+ {
+ PartClause *cur = (PartClause *) lfirst(lc);
+
+ if (!cur->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(cur->op->opno,
+ partkey->partopfamily[i],
+ false,
+ &cur->op_strategy,
+ &lefttype,
+ &cur->op_subtype);
+ fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
+ cur->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = cur;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, i,
+ cur, cur, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauses->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ newlist = lappend(newlist, cur);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = cur->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = cur;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, i,
+ cur, cur, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = cur;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauses->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ newlist = lappend(newlist, cur);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ newlist = lappend(newlist, hash_clause);
+ list_free(partclauses->clauses[i]);
+ partclauses->clauses[i] = newlist;
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, i,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauses->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ newlist = lappend(newlist, btree_clauses[s]);
+ }
+
+ /*
+ * Replace the old List with the new one with the redundant clauses
+ * removed.
+ */
+ list_free(partclauses->clauses[i]);
+ partclauses->clauses[i] = newlist;
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ * We may not be able to perform the comparison if operand values are
+ * unavailable and/or types of operands are incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Oid partopfamily = key->partopfamily[partkeyidx];
+ Datum leftarg_const,
+ rightarg_const;
+
+ Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+ /* Get the constant values from the operands */
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ leftarg->constarg, &leftarg_const))
+ return false;
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ rightarg->constarg, &rightarg_const))
+ return false;
+
+ /*
+ * We can compare leftarg_const and rightarg_const using op's operator
+ * only if both are of the type expected by it.
+ */
+ if (leftarg->op_subtype == op->op_subtype &&
+ rightarg->op_subtype == op->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ else
+ {
+ /* Otherwise, look one up in the partitioning operator family. */
+ Oid cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ op->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ op->op->inputcollid,
+ leftarg_const,
+ rightarg_const));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions of 'rel' that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selceted partitions
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ return NULL; /* keep compiler quiet */
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+ Assert(partkey->partnatts == 1);
+
+ /*
+ * If the query is looking for null keys, there can only be one such
+ * partition. Return the same if one exists.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (minoff >= 0)
+ {
+ /*
+ * The bound at minoff is <= minkeys, given the way
+ * partition_bound_bsearch() works. If it's not equal (<), then
+ * increment minoff to make it point to the datum on the right
+ * that necessarily satisfies minkeys. Also do the same if it is
+ * equal but minkeys is exclusive.
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * minkeys is greater than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (maxoff >= 0)
+ {
+ /*
+ * The bound at maxoff is <= maxkeys, given the way
+ * partition_bound_bsearch works. If the bound at maxoff exactly
+ * matches maxkey (is_equal), but the maxkey is exclusive, then
+ * decrement maxoff to point to the bound on the left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal,
+ include_def = false;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partkey->partnatts);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering it as the lower bound
+ * of the partition that eqkeys falls into, the bound at eqoff + 1
+ * would be its upper bound, so use eqoff + 1 to get the desired
+ * partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_bound_bsearch would've returned the offset of just one of
+ * those. If minkey is inclusive, we must decrement minoff until it
+ * reaches the leftmost of those bound values, so that partitions
+ * corresponding to all those bound values are selected. If minkeys
+ * is exclusive, we must increment minoff until it reaches the first
+ * bound greater than this prefix, so that none of the partitions
+ * corresponding to those bound values are selected.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ keys->min_incl
+ ? minoff - 1 : minoff + 1,
+ &arg);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff += 1;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ keys->max_incl
+ ? maxoff + 1 : maxoff - 1,
+ &arg);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, we believe that minoff/maxoff point to the upper bound
+ * of some partition, but it may not be the case. It might actually be
+ * the upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range us unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..8423c6e886 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -73,4 +73,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
+ List *partclauses);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
v21-0003-Move-some-code-of-set_append_rel_size-to-separat.patchtext/plain; charset=UTF-8; name=v21-0003-Move-some-code-of-set_append_rel_size-to-separat.patchDownload
From bd48279e33539fe1dca3851b349899abed9fc451 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH v21 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c5304b712e..fee078a9c7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ac5a7c9553..35345ccbe9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 725694f570..9b4288ad92 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -300,5 +300,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
v21-0004-More-refactoring-around-partitioned-table-Append.patchtext/plain; charset=UTF-8; name=v21-0004-More-refactoring-around-partitioned-table-Append.patchDownload
From db2ae93c6c52890188ae56bbb36ff76109c90152 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v21 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 120 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 19 ++++--
src/backend/optimizer/util/relnode.c | 10 +++
src/include/nodes/relation.h | 22 ++++++-
4 files changed, 115 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fee078a9c7..8f761a77e8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1207,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1299,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1349,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ list_copy(childrel->live_partitioned_rels));
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7b52dadd81..b0f6051618 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6189,14 +6189,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 35345ccbe9..4b5d50eb2c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +743,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 71689b8ed6..63623f2687 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,6 +529,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -657,10 +658,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
v21-0005-Teach-planner-to-use-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v21-0005-Teach-planner-to-use-get_partitions_from_clauses.patchDownload
From 17bba5221931ddaeccb658fbb4ad4d24955921c8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH v21 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using the information in its
PartitionScheme. Further, if we pass the set of matched clauses to
get_partitions_from_clauses(), we get the set of matching partitions
in (hopefully) less time than determining the same by running the
matching algorithm separately for each partition.
Authors: Amit Langote,
Dilip Kumar (dilipbalaut@gmail.com),
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/optimizer/path/allpaths.c | 63 +++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 41 ++-
src/include/nodes/relation.h | 7 +-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition_prune.out | 442 ++++++++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 74 ++++-
7 files changed, 581 insertions(+), 78 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8f761a77e8..51648c80b3 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,8 +20,10 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
@@ -136,6 +138,14 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
+static List *match_clauses_to_partkey(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *clauses,
+ bool *contains_const,
+ bool *constfalse);
/*
@@ -847,6 +857,54 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Returns a List of AppendRelInfo belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ Relation partrel;
+ Bitmapset *partindexes;
+ List *result = NIL;
+ int i;
+
+ partrel = heap_open(rte->relid, NoLock);
+
+ partindexes = get_partitions_from_clauses(partrel, rel->relid,
+ rel->baserestrictinfo);
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
+ RangeTblEntry *childrte;
+
+ childrte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == childrte->relid);
+#endif
+
+ result = lappend(result, appinfo);
+ }
+
+ heap_close(partrel, NoLock);
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +946,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a35d068911..6949886e46 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1395,6 +1395,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8c60b35068..c103deb21b 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1161,7 +1161,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1247,22 +1246,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1920,6 +1929,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 63623f2687..855d51ea09 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,10 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * Since partitioning might be using a collation for a given partition key
+ * column that is not same as the collation implied by column's type, store
+ * the same separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +353,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..2072766efd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..83e60814f7 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1067,363 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+ QUERY PLAN
+----------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+ QUERY PLAN
+-------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_bc
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_ef
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_g
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+ -> Seq Scan on lp_default
+ Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
+(11 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..13b12078bf 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,76 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- various cases for list partitioning where pruning should work
+explain (costs off) select * from lp where a <> 'a' and a is not null;
+explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
On 18 January 2018 at 23:56, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
I've not fully worked out how run-time pruning
will use this as it'll need another version of
get_partitions_from_clauses but passes in a PartScanClauseInfo
instead, and does not call extract_partition_key_clauses. That area
probably needs some shuffling around so that does not end up a big
copy and paste of all that new logic.So, I've been assuming that the planner changes in the run-time pruning
patch have to do with selecting clauses (restriction clauses not
containing Consts and/or join clauses) to be passed to the executor by
recording them in the Append node. Will they be selected by the planner
calling into partition.c?
I had thought so. I only have a rough idea in my head and that's that
the PartitionPruneInfo struct that I wrote for the run-time pruning
patch would have the clause List replaced with this new
PartScanClauseInfo struct (which likely means it needs to go into
primnodes.h), this struct would contain all the partition pruning
clauses in a more structured form so that no matching of quals to the
partition key wouldn't be required during execution. The idea is that
we'd just need to call; remove_redundant_clauses,
extract_bounding_datums and get_partitions_for_keys. I think this is
the bare minimum of work that can be done in execution since we can't
remove the redundant clauses until we know the values of the Params.
Likely this means there will need to be 2 functions externally
accessible for this in partition.c. I'm not sure exactly how best to
do this. Maybe it's fine just to have allpaths.c call
extract_partition_key_clauses to generate the PartScanClauseInfo then
call some version of get_partitions_from_clauses which does do the
extract_partition_key_clauses duties. This is made more complex by the
code that handles adding the default partition bound to the quals. I'm
not yet sure where that should live.
I've also been thinking of having some sort of PartitionPruneContext
struct that we can pass around the functions. Runtime pruning had to
add structs which store the Param values to various functions which I
didn't like. It would be good to just add those to the context and
have them passed down without having to bloat the parameters in the
functions. I might try and do that tomorrow too. This should make the
footprint of the runtime pruning patch is a bit smaller.
Meanwhile, here is a revised version (v21) that incorporates your changes.
I added you as the author in 0002 and 0005 patches. I guess a v22 will
have to follow very soon...
Thanks for merging that in.
I'll have a try at making this work tomorrow, although I'm not sure
yet how much time I'll have to dedicate to it as I have a few other
things to do too.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi Amit,
It seems your mail system continually adds "[Sender Address Forgery]"
prefixes to messages. E.g. this mail now has
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
as its subject, whereas the mail you're replying to only had
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
two of them.
I think the two previous occurances of this also are from you.
This is somewhat annoying, could you try to figure out a) what the
problem is b) how to prevent the subject being edited like that?
Regards,
Andres
Hello,
At Thu, 18 Jan 2018 11:41:00 -0800, Andres Freund <andres@anarazel.de> wrote in <20180118194100.dy3kxdtktsbvm4eq@alap3.anarazel.de>
Hi Amit,
It seems your mail system continually adds "[Sender Address Forgery]"
prefixes to messages. E.g. this mail now has
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
as its subject, whereas the mail you're replying to only had
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
two of them.I think the two previous occurances of this also are from you.
This is somewhat annoying, could you try to figure out a) what the
problem is b) how to prevent the subject being edited like that?
Our mail server is failing to fetch SPF record for David's mails
that received directly (not via -hakders ML) and the server adds
the subject header. It is failing to fetch SPF record for
2ndquadrant.com. The reason might be that the envelope-from of
his mails is not consistent with his server's IP address.
Anyway, mails via -hackers ML doesn't suffer so, what Amit (and
I) side can do by myself is one of the following.
- Being careful to reply to the mails comming via the ML.
- Remove the added header by hand..
And I'd like to ask David to check out his mail environment so
that SPF record is available for his message.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Thank you Horiguchi-san!
On 2018/01/19 12:00, Kyotaro HORIGUCHI wrote:
At Thu, 18 Jan 2018 11:41:00 -0800, Andres Freund <andres@anarazel.de> wrote:
Hi Amit,
It seems your mail system continually adds "[Sender Address Forgery]"
prefixes to messages. E.g. this mail now has
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
as its subject, whereas the mail you're replying to only had
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
two of them.I think the two previous occurances of this also are from you.
This is somewhat annoying, could you try to figure out a) what the
problem is b) how to prevent the subject being edited like that?
Sorry about that. I had noticed it and had manually edited the subject
line once or twice before, but failed to do it every time.
Our mail server is failing to fetch SPF record for David's mails
that received directly (not via -hakders ML) and the server adds
the subject header. It is failing to fetch SPF record for
2ndquadrant.com. The reason might be that the envelope-from of
his mails is not consistent with his server's IP address.
I was able to notice that too. It seems that the Received-SPF: PermError
and X-SPF-Status = fail/warn headers started appearing in the emails only
a few months ago, so it appears our mail server was changed to notice
these discrepancies only recently. First email in this thread that got
such subject line was the following, which is my reply to David's email:
/messages/by-id/b8094e71-2c73-ed8e-d8c3-53f232c8c049@lab.ntt.co.jp
There are emails from David before that one and I couldn't see such
headers in all those emails, so no alterations of the subject line.
Anyway, mails via -hackers ML doesn't suffer so, what Amit (and
I) side can do by myself is one of the following.- Being careful to reply to the mails comming via the ML.
- Remove the added header by hand..
Yeah, will make sure to do that.
And I'd like to ask David to check out his mail environment so
that SPF record is available for his message.
That'd be nice too.
Thanks,
Amit
On 19 January 2018 at 16:00, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
And I'd like to ask David to check out his mail environment so
that SPF record is available for his message.
Will investigate
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi Amit
,
On 19 January 2018 at 04:03, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 18 January 2018 at 23:56, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
So, I've been assuming that the planner changes in the run-time pruning
patch have to do with selecting clauses (restriction clauses not
containing Consts and/or join clauses) to be passed to the executor by
recording them in the Append node. Will they be selected by the planner
calling into partition.c?I had thought so. I only have a rough idea in my head and that's that
the PartitionPruneInfo struct that I wrote for the run-time pruning
patch would have the clause List replaced with this new
PartScanClauseInfo struct (which likely means it needs to go into
primnodes.h), this struct would contain all the partition pruning
clauses in a more structured form so that no matching of quals to the
partition key wouldn't be required during execution. The idea is that
we'd just need to call; remove_redundant_clauses,
extract_bounding_datums and get_partitions_for_keys. I think this is
the bare minimum of work that can be done in execution since we can't
remove the redundant clauses until we know the values of the Params.Likely this means there will need to be 2 functions externally
accessible for this in partition.c. I'm not sure exactly how best to
do this. Maybe it's fine just to have allpaths.c call
extract_partition_key_clauses to generate the PartScanClauseInfo then
call some version of get_partitions_from_clauses which does do the
extract_partition_key_clauses duties. This is made more complex by the
code that handles adding the default partition bound to the quals. I'm
not yet sure where that should live.I've also been thinking of having some sort of PartitionPruneContext
struct that we can pass around the functions. Runtime pruning had to
add structs which store the Param values to various functions which I
didn't like. It would be good to just add those to the context and
have them passed down without having to bloat the parameters in the
functions. I might try and do that tomorrow too. This should make the
footprint of the runtime pruning patch is a bit smaller.
Attached is what I had in mind about how to do this. Only the planner
will need to call populate_partition_clause_info. The planner and
executor can call get_partitions_from_clauseinfo. I'll just need to
change the run-time prune patch to pass the PartitionClauseInfo into
the executor in the Append node.
I've also added the context struct that I mentioned above. It's
currently not carrying much weight, but the idea is that this context
will be passed around a bit more with the run-time pruning patch and
it will also carry the details about parameter values. I'll need to
modify a few signatures of functions like partkey_datum_from_expr to
pass the context there too. I didn't do that here because currently,
those functions have no use for the context with the fields that they
currently have.
I've also fixed a bug where when you built the commutator OpExpr in
what's now called extract_partition_key_clauses the inputcollid was
not being properly set. The changes I made there go much further than
just that, I've completely removed the OpExpr from the PartClause
struct as only two members were ever used. I thought it was better
just to add those to PartClause instead.
I also did some renaming of variables that always assumed that the
Expr being compared to the partition key was a constant. This is true
now, but with run-time pruning patch, it won't be, so I thought it was
better to do this here rather than having to rename them in the
run-time pruning patch.
One thing I don't yet understand about the patch is the use of
predicate_refuted_by() in get_partitions_from_or_clause_args(). I did
adjust the comment above that code, but I'm still not certain I fully
understand why that function has to be used instead of building the
clauses for the OR being processed along with the remaining clauses.
Is it that this was too hard to extract that you ended up using
predicate_refuted_by()?
I've also removed the makeBoolExpr call that you were encapsulating
the or_clauses in. I didn't really see the need for this since you
just removed it again when looping over the or_clauses.
The only other changes are just streamlining code and comment changes.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
faster_partition_prune_v21_delta_drowley_v1.patchapplication/octet-stream; name=faster_partition_prune_v21_delta_drowley_v1.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index b35e35c..ad789f6 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -173,8 +173,9 @@ typedef struct PartitionBoundCmpArg
*/
typedef struct PartClause
{
- OpExpr *op;
- Expr *constarg;
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
/* cached info. */
bool valid_cache; /* Are the following fields populated? */
@@ -195,24 +196,6 @@ typedef enum PartOpStrategy
} PartOpStrategy;
/*
- * Clauses matched to partition keys
- */
-typedef struct PartScanClauseInfo
-{
- /* Lists of clauses indexed by partition key */
- List *clauses[PARTITION_MAX_KEYS];
-
- List *or_clauses; /* List of clauses found in an OR branch */
- List *ne_clauses; /* Clauses in the form partkey <> Expr */
-
- Bitmapset *keyisnull;
- Bitmapset *keyisnotnull;
-
- /* Stored data is known to contain impossible contradictions */
- bool constfalse;
-} PartScanClauseInfo;
-
-/*
* PartScanKeyInfo
* Information about partition look up keys to be passed to
* get_partitions_for_keys()
@@ -307,26 +290,25 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
-static Bitmapset *get_partitions_from_clauses_recurse(Relation relation,
- int rt_index, List *clauses);
static Bitmapset *get_partitions_excluded_by_ne_clauses(Relation relation,
List *ne_clauses);
-static Bitmapset *get_partitions_from_or_clause_args(Relation relation,
- int rt_index, List *or_clause_args);
-static bool extract_partition_key_clauses(PartitionKey partkey, List *clauses,
- int rt_index, PartScanClauseInfo *partclauses);
+static Bitmapset *get_partitions_from_or_clause_args(
+ PartitionPruneContext *context,
+ List *or_clause_args);
+static void extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index, PartitionClauseInfo *partclauses);
static bool extract_bounding_datums(PartitionKey partkey,
- PartScanClauseInfo *partclauses,
+ PartitionPruneContext *context,
PartScanKeyInfo *keys);
static bool partition_cmp_args(PartitionKey key, int partkeyidx,
- PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
bool *result);
-static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *op,
+static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *pc,
bool *incl);
static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
Expr *expr, Datum *value);
static void remove_redundant_clauses(PartitionKey partkey,
- PartScanClauseInfo *partclauses);
+ PartitionPruneContext *context);
static Bitmapset *get_partitions_for_keys(Relation rel,
PartScanKeyInfo *keys);
static Bitmapset *get_partitions_for_keys_hash(Relation rel,
@@ -1707,38 +1689,46 @@ get_partition_qual_relid(Oid relid)
}
/*
- * get_partitions_from_clauses
- * Determine all partitions of 'relation' that could possibly contain a
- * record that matches 'partclauses'
+ * populate_partition_clause_info
+ * Processes 'clauses' to try to match them to relation's partition
+ * keys. If any clauses are found which match a partition key, then
+ * these clauses are stored in 'partclauseinfo'.
*
- * Returns a Bitmapset of the matching partition indexes, or NULL if none can
- * match.
+ * The caller must ensure that 'clauses' is not an empty List. Upon return,
+ * callers must also check if the 'partclauseinfo' constfalse has been set, if
+ * so, then they must be aware that the 'partclauseinfo' may only be partially
+ * populated.
*/
-Bitmapset *
-get_partitions_from_clauses(Relation relation, int rt_index,
- List *partclauses)
+void
+populate_partition_clause_info(Relation relation,
+ int rt_index, List *clauses,
+ PartitionClauseInfo *partclauseinfo)
{
- PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionDesc partdesc;
+ PartitionKey partkey;
PartitionBoundInfo boundinfo;
- List *clauses;
- /* All partitions match if there are no clauses */
- if (!partclauses)
- return bms_add_range(NULL, 0, partdesc->nparts - 1);
+ Assert(clauses != NIL);
+
+ partkey = RelationGetPartitionKey(relation);
+ partdesc = RelationGetPartitionDesc(relation);
/* Some functions called below modify this list */
- clauses = list_copy(partclauses);
+ clauses = list_copy(clauses);
boundinfo = partdesc->boundinfo;
/*
- * If relation is a sub-partitioned table, add its partition constraint
- * clauses to the list of clauses to use for partition pruning. This
- * is done to facilitate correct decision regarding the default
- * partition. Adding the partition constraint clauses to the list helps
- * restrict the possible key space to only that allowed by the partition
- * and thus avoids the default partition being inadvertently added to the
- * set of selected partitions for a query whose clauses select a key space
- * bigger than the partition's.
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement
+ * is perhaps unlikely for non-default partitions, but it may be more
+ * likely in the case of default partitions, so we'll add the parent
+ * partition table's partition qual to the clause list in this case only.
+ * This may result in the default partition being eliminated.
*/
if (partition_bound_has_default(boundinfo))
{
@@ -1753,47 +1743,56 @@ get_partitions_from_clauses(Relation relation, int rt_index,
clauses = list_concat(clauses, partqual);
}
- return get_partitions_from_clauses_recurse(relation, rt_index, clauses);
+ extract_partition_key_clauses(partkey, clauses, rt_index, partclauseinfo);
}
-/* Module-local functions */
-
/*
- * get_partitions_from_clauses_recurse
- * Determine relation's partitions that satisfy *all* of the clauses
- * in the list
+ * get_partitions_from_clauses
+ * Determine all partitions of the context 'relation' that could possibly
+ * contain a record that matches the context 'clauseinfo'
*
- * Return value is a Bitmapset containing the indexes of selected partitions.
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
*/
-static Bitmapset *
-get_partitions_from_clauses_recurse(Relation relation, int rt_index,
- List *clauses)
+Bitmapset *
+get_partitions_from_clauseinfo(PartitionPruneContext *context)
{
- PartitionDesc partdesc = RelationGetPartitionDesc(relation);
- PartitionKey partkey = RelationGetPartitionKey(relation);
- PartScanClauseInfo partclauses;
- Bitmapset *result;
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartitionDesc partdesc;
+ PartitionBoundInfo boundinfo;
+ PartitionKey partkey;
PartScanKeyInfo keys;
+ Bitmapset *result;
+ Relation relation;
+ int rt_index;
ListCell *lc;
- /* Populate partclauses from the clause list */
- if (extract_partition_key_clauses(partkey, clauses, rt_index, &partclauses))
- {
- /*
- * No partitions to scan if extract_partition_key_clauses found some
- * clause contradiction.
- */
- if (partclauses.constfalse)
- return NULL;
+ Assert(partclauseinfo != NULL);
+ /*
+ * Check if there were proofs that no partitions can match due to some
+ * clause items contradicting each other.
+ */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ relation = context->relation;
+ rt_index = context->rt_index;
+
+ partdesc = RelationGetPartitionDesc(relation);
+ boundinfo = partdesc->boundinfo;
+ partkey = RelationGetPartitionKey(relation);
+
+ if (partclauseinfo->foundkeyclauses)
+ {
/* collapse clauses down to the most restrictive set */
- remove_redundant_clauses(partkey, &partclauses);
+ remove_redundant_clauses(partkey, context);
/* Did remove_redundant_clauses find any contradicting clauses? */
- if (partclauses.constfalse)
+ if (partclauseinfo->constfalse)
return NULL;
- if (extract_bounding_datums(partkey, &partclauses, &keys))
+ if (extract_bounding_datums(partkey, context, &keys))
{
result = get_partitions_for_keys(relation, &keys);
@@ -1816,34 +1815,35 @@ get_partitions_from_clauses_recurse(Relation relation, int rt_index,
}
else
{
- /*
- * no useful key clauses found, but we might still be able to
- * eliminate some partitions with ne_clauses or or_clauses.
- */
result = bms_add_range(NULL, 0, partdesc->nparts - 1);
}
/* Select partitions by applying the clauses containing <> operators. */
- if (partclauses.ne_clauses)
+ if (partclauseinfo->ne_clauses)
{
- Bitmapset *ne_clause_parts;
+ Bitmapset *ne_parts;
- ne_clause_parts = get_partitions_excluded_by_ne_clauses(relation,
- partclauses.ne_clauses);
+ ne_parts = get_partitions_excluded_by_ne_clauses(relation,
+ partclauseinfo->ne_clauses);
/* Remove any partitions we found to not be needed */
- result = bms_del_members(result, ne_clause_parts);
- bms_free(ne_clause_parts);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
}
/* Select partitions by applying OR clauses. */
- foreach(lc, partclauses.or_clauses)
+ foreach(lc, partclauseinfo->or_clauses)
{
- BoolExpr *or = (BoolExpr *) lfirst(lc);
+ List *or_args = (List *) lfirst(lc);
+ PartitionPruneContext orcontext;
Bitmapset *or_parts;
- or_parts = get_partitions_from_or_clause_args(relation, rt_index,
- or->args);
+ orcontext.rt_index = context->rt_index;
+ orcontext.relation = context->relation;
+ orcontext.clauseinfo = NULL;
+
+ or_parts = get_partitions_from_or_clause_args(&orcontext, or_args);
+
/*
* Clauses in or_clauses are mutually conjunctive and also in
* in conjunction with the rest of the clauses above, so combine the
@@ -1857,6 +1857,8 @@ get_partitions_from_clauses_recurse(Relation relation, int rt_index,
return result;
}
+/* Module-local functions */
+
/*
* get_partitions_excluded_by_ne_clauses
*
@@ -1892,7 +1894,7 @@ get_partitions_excluded_by_ne_clauses(Relation relation, List *ne_clauses)
PartClause *pc = (PartClause *) lfirst(lc);
Datum datum;
- if (partkey_datum_from_expr(partkey, 0, pc->constarg, &datum))
+ if (partkey_datum_from_expr(partkey, 0, pc->value, &datum))
{
int offset;
bool is_equal;
@@ -1973,17 +1975,25 @@ get_partitions_excluded_by_ne_clauses(Relation relation, List *ne_clauses)
* clause in or_clause_args.
*/
static Bitmapset *
-get_partitions_from_or_clause_args(Relation relation, int rt_index,
+get_partitions_from_or_clause_args(PartitionPruneContext *context,
List *or_clause_args)
{
- ListCell *lc;
- Bitmapset *result = NULL;
+ List *partconstr;
+ PartitionKey partkey;
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ partconstr = RelationGetPartitionQual(context->relation);
+ partkey = RelationGetPartitionKey(context->relation);
+
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->rt_index, 0);
foreach(lc, or_clause_args)
{
- List *arg_clauses = list_make1(lfirst(lc));
- List *partconstr = RelationGetPartitionQual(relation);
- Bitmapset *arg_partset;
+ List *clauses = list_make1(lfirst(lc));
+ PartitionClauseInfo partclauseinfo;
/*
* It's possible that this clause is never true for this relation due
@@ -1992,23 +2002,29 @@ get_partitions_from_or_clause_args(Relation relation, int rt_index,
* OR clause may not contain any quals matching this partition table's
* partition key, it may contain some belonging to a parent partition
* though, so we may not have all the quals here required to make use
- * of get_partitions_from_clauses_recurse to determine the correct set
- * of partitions, so we'll just make use of predicate_refuted_by
- * instead.
+ * of get_partitions_from_clauseinfo to determine the correct set of
+ * partitions, so we'll just make use of predicate_refuted_by instead.
*/
- if (partconstr)
+ if (partconstr && predicate_refuted_by(partconstr, clauses, false))
+ continue;
+
+ extract_partition_key_clauses(partkey, clauses, context->rt_index,
+ &partclauseinfo);
+
+ if (!partclauseinfo.constfalse)
{
- partconstr = (List *) expression_planner((Expr *) partconstr);
- if (rt_index != 1)
- ChangeVarNodes((Node *) partconstr, 1, rt_index, 0);
- if (predicate_refuted_by(partconstr, arg_clauses, false))
- continue;
- }
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ subcontext.rt_index = context->rt_index;
+ subcontext.relation = context->relation;
+ subcontext.clauseinfo = &partclauseinfo;
+
+ arg_partset = get_partitions_from_clauseinfo(&subcontext);
- arg_partset = get_partitions_from_clauses_recurse(relation, rt_index,
- arg_clauses);
- result = bms_add_members(result, arg_partset);
- bms_free(arg_partset);
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
}
return result;
@@ -2028,28 +2044,25 @@ get_partitions_from_or_clause_args(Relation relation, int rt_index,
* extract_partition_key_clauses
* Process 'clauses' to extract clause matching the partition key.
* This populates 'partclauses' with the set of clauses matching each
- * key also also collects other useful clauses to assist in partition
+ * key and also collects other useful clauses to assist in partition
* elimination, such as or clauses and not equal clauses. We also record
* which partitions keys we can prove are NULL or NOT NULL.
*
- * We may also discover some contradition in the clauses which means that no
- * partition can possibly match. In this case the function sets partclauses's
- * 'constfalse' to true and returns true. In this case the caller should not
- * assume the clauses have been fully processed as we abort as soon as we find
- * a contradicting condition.
- *
- * The function returns false if no useful key clauses were found.
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * partclauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the PartitionClauseInfo is fully populated with all clauses.
*/
-static bool
+static void
extract_partition_key_clauses(PartitionKey partkey, List *clauses,
int rt_index,
- PartScanClauseInfo *partclauses)
+ PartitionClauseInfo *partclauseinfo)
{
int i;
ListCell *lc;
- bool got_useful_keys = false;
- memset(partclauses, 0, sizeof(PartScanClauseInfo));
+ memset(partclauseinfo, 0, sizeof(PartitionClauseInfo));
foreach(lc, clauses)
{
@@ -2064,8 +2077,8 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
if (rinfo->pseudoconstant &&
!DatumGetBool(((Const *) clause)->constvalue))
{
- partclauses->constfalse = true;
- return true;
+ partclauseinfo->constfalse = true;
+ return;
}
}
@@ -2074,7 +2087,9 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
{
if (or_clause((Node *) clause))
{
- partclauses->or_clauses = lappend(partclauses->or_clauses, clause);
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
continue;
}
else if (and_clause((Node *) clause))
@@ -2127,7 +2142,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
OpExpr *opclause = (OpExpr *) clause;
Expr *leftop,
*rightop,
- *constexpr;
+ *valueexpr;
bool is_ne_listp = false;
leftop = (Expr *) get_leftop(clause);
@@ -2139,10 +2154,10 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
/* check if the clause matches the partition key */
if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
- constexpr = rightop;
+ valueexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
{
- constexpr = leftop;
+ valueexpr = leftop;
commutator = get_commutator(opclause->opno);
@@ -2169,7 +2184,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
continue;
/* Useless if the "constant" can change its value. */
- if (contain_volatile_functions((Node *) constexpr))
+ if (contain_volatile_functions((Node *) valueexpr))
continue;
/*
@@ -2215,25 +2230,9 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
}
pc = (PartClause *) palloc0(sizeof(PartClause));
- pc->constarg = constexpr;
-
- /*
- * If commutator is set to a valid Oid then we'll need to swap
- * the left and right operands. Later code requires that the
- * partkey is on the left side.
- */
- if (!OidIsValid(commutator))
- pc->op = opclause;
- else
- {
- OpExpr *commuted;
-
- commuted = (OpExpr *) copyObject(opclause);
- commuted->opno = commutator;
- commuted->opfuncid = get_opcode(commutator);
- commuted->args = list_make2(rightop, leftop);
- pc->op = commuted;
- }
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
/*
* We don't turn a <> operator clause into a key right away.
@@ -2241,24 +2240,28 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
* get_partitions_excluded_by_ne_clauses().
*/
if (is_ne_listp)
- partclauses->ne_clauses = lappend(partclauses->ne_clauses,
- pc);
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
else
{
- partclauses->clauses[i] = lappend(partclauses->clauses[i], pc);
- got_useful_keys = true;
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ partclauseinfo->foundkeyclauses = true;
/*
* Since we only allow strict operators, require keys to
* be not null.
*/
- if (bms_is_member(i, partclauses->keyisnull))
+ if (bms_is_member(i, partclauseinfo->keyisnull))
{
- partclauses->constfalse = true;
- return true;
+ partclauseinfo->constfalse = true;
+ return;
}
- partclauses->keyisnotnull =
- bms_add_member(partclauses->keyisnotnull, i);
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
}
}
else if (IsA(clause, ScalarArrayOpExpr))
@@ -2418,9 +2421,9 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
* of the list that's being processed currently.
*/
if (saop->useOr)
- partclauses->or_clauses = lappend(partclauses->or_clauses,
- makeBoolExpr(OR_EXPR, elem_clauses,
- -1));
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
else
clauses = list_concat(clauses, elem_clauses);
}
@@ -2438,27 +2441,29 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
if (nulltest->nulltesttype == IS_NULL)
{
/* check for conflicting IS NOT NULLs */
- if (bms_is_member(i, partclauses->keyisnotnull))
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
{
- partclauses->constfalse = true;
- return true;
+ partclauseinfo->constfalse = true;
+ return;
}
- partclauses->keyisnull =
- bms_add_member(partclauses->keyisnull, i);
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
}
else
{
/* check for conflicting IS NULLs */
- if (bms_is_member(i, partclauses->keyisnull))
+ if (bms_is_member(i, partclauseinfo->keyisnull))
{
- partclauses->constfalse = true;
- return true;
+ partclauseinfo->constfalse = true;
+ return;
}
- partclauses->keyisnotnull =
- bms_add_member(partclauses->keyisnotnull, i);
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
}
- got_useful_keys = true;
+ partclauseinfo->foundkeyclauses = true;
}
}
/*
@@ -2474,8 +2479,6 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
Expr *leftop,
*rightop;
- pc = (PartClause *) palloc0(sizeof(PartClause));
-
if (IsA(clause, BooleanTest))
{
BooleanTest *btest = (BooleanTest *) clause;
@@ -2514,19 +2517,19 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
? (Expr *) makeBoolConst(true, false)
: (Expr *) makeBoolConst(false, false);
}
- pc->op = (OpExpr *) make_opclause(BooleanEqualOperator,
- BOOLOID, false,
- leftop, rightop,
- InvalidOid, InvalidOid);
- pc->constarg = rightop;
- partclauses->clauses[i] = lappend(partclauses->clauses[i],
- pc);
- got_useful_keys = true;
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ partclauseinfo->foundkeyclauses = true;
}
}
}
-
- return got_useful_keys;
}
/*
@@ -2543,21 +2546,24 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
* Returns true if any keys were found during partition pruning.
*/
static bool
-extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
+extract_bounding_datums(PartitionKey partkey, PartitionPruneContext *context,
PartScanKeyInfo *keys)
{
+ PartitionClauseInfo *clauseinfo;
bool need_next_eq,
need_next_min,
need_next_max;
int i;
ListCell *lc;
+ clauseinfo = context->clauseinfo;
+
/*
* Based on the strategies of the clause operators (=, </<=, >/>=), try to
- * construct tuples of those datums that serve as the exact look up tuple
+ * construct tuples of those datums that serve as the exact lookup tuple
* or tuples that serve as minimum and maximum bound. If we find datums
* for all partition key columns that appear in = operator clauses, then
- * we have the exact match look up tuple, which will be used to match just
+ * we have the exact match lookup tuple, which will be used to match just
* one partition. If the last datum in a tuple comes from a clause
* containing </<= or >/>= operator, then that constitutes the minimum
* or maximum bound tuple, respectively. There is one exception -- if
@@ -2572,7 +2578,7 @@ extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
memset(keys, 0, sizeof(PartScanKeyInfo));
for (i = 0; i < partkey->partnatts; i++)
{
- List *clauselist = partclauses->clauses[i];
+ List *clauselist = clauseinfo->keyclauses[i];
/*
* Min and max keys must constitute a prefix of the partition key and
@@ -2592,7 +2598,7 @@ extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
foreach(lc, clauselist)
{
PartClause *clause = (PartClause *) lfirst(lc);
- Expr *constarg = clause->constarg;
+ Expr *value = clause->value;
bool incl;
PartOpStrategy op_strategy;
@@ -2602,12 +2608,12 @@ extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
case PART_OP_EQUAL:
Assert(incl);
if (need_next_eq &&
- partkey_datum_from_expr(partkey, i, constarg,
+ partkey_datum_from_expr(partkey, i, value,
&keys->eqkeys[i]))
keys->n_eqkeys++;
if (need_next_max &&
- partkey_datum_from_expr(partkey, i, constarg,
+ partkey_datum_from_expr(partkey, i, value,
&keys->maxkeys[i]))
{
keys->n_maxkeys++;
@@ -2615,7 +2621,7 @@ extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
}
if (need_next_min &&
- partkey_datum_from_expr(partkey, i, constarg,
+ partkey_datum_from_expr(partkey, i, value,
&keys->minkeys[i]))
{
keys->n_minkeys++;
@@ -2625,7 +2631,7 @@ extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
case PART_OP_LESS:
if (need_next_max &&
- partkey_datum_from_expr(partkey, i, constarg,
+ partkey_datum_from_expr(partkey, i, value,
&keys->maxkeys[i]))
{
keys->n_maxkeys++;
@@ -2637,7 +2643,7 @@ extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
case PART_OP_GREATER:
if (need_next_min &&
- partkey_datum_from_expr(partkey, i, constarg,
+ partkey_datum_from_expr(partkey, i, value,
&keys->minkeys[i]))
{
keys->n_minkeys++;
@@ -2670,8 +2676,8 @@ extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
keys->n_eqkeys = 0;
/* Finally, also set the keyisnull and keyisnotnull values. */
- keys->keyisnull = partclauses->keyisnull;
- keys->keyisnotnull = partclauses->keyisnotnull;
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
keys->n_maxkeys > 0 || !bms_is_empty(keys->keyisnull) ||
@@ -2680,57 +2686,54 @@ extract_bounding_datums(PartitionKey partkey, PartScanClauseInfo *partclauses,
/*
* partition_op_strategy
- * Returns whether the clause in 'op' contains an =, </<=, or >/>=
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
* operator and set *incl to true if the operator's strategy is
* inclusive.
*/
static PartOpStrategy
-partition_op_strategy(PartitionKey key, PartClause *op, bool *incl)
+partition_op_strategy(PartitionKey key, PartClause *pc, bool *incl)
{
- PartOpStrategy result;
+ *incl = false; /* overwritten below */
- *incl = false; /* overwritten as appropriate below */
switch (key->strategy)
{
/* Hash partitioning allows only hash equality. */
case PARTITION_STRATEGY_HASH:
- if (op->op_strategy == HTEqualStrategyNumber)
+ if (pc->op_strategy == HTEqualStrategyNumber)
{
*incl = true;
- result = PART_OP_EQUAL;
+ return PART_OP_EQUAL;
}
- break;
+ elog(ERROR, "unexpected strategy number: %d",
+ pc->op_strategy);
/* List and range partitioning support all btree operators. */
case PARTITION_STRATEGY_LIST:
case PARTITION_STRATEGY_RANGE:
- switch (op->op_strategy)
+ switch (pc->op_strategy)
{
case BTLessEqualStrategyNumber:
*incl = true;
/* fall through */
case BTLessStrategyNumber:
- result = PART_OP_LESS;
- break;
+ return PART_OP_LESS;
+
case BTEqualStrategyNumber:
*incl = true;
- result = PART_OP_EQUAL;
- break;
+ return PART_OP_EQUAL;
case BTGreaterEqualStrategyNumber:
*incl = true;
/* fall through */
case BTGreaterStrategyNumber:
- result = PART_OP_GREATER;
- break;
+ return PART_OP_GREATER;
}
- break;
default:
elog(ERROR, "unexpected partition strategy: %d",
(int) key->strategy);
}
- return result;
+ return PART_OP_EQUAL; /* keep compiler quiet */
}
/*
@@ -2811,10 +2814,11 @@ partkey_datum_from_expr(PartitionKey key, int partkeyidx,
*/
static void
remove_redundant_clauses(PartitionKey partkey,
- PartScanClauseInfo *partclauses)
+ PartitionPruneContext *context)
{
PartClause *hash_clause,
*btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
ListCell *lc;
int s;
int i;
@@ -2823,29 +2827,29 @@ remove_redundant_clauses(PartitionKey partkey,
for (i = 0; i < partkey->partnatts; i++)
{
- List *all_clauses = partclauses->clauses[i];
+ List *keyclauses = partclauseinfo->keyclauses[i];
hash_clause = NULL;
newlist = NIL;
memset(btree_clauses, 0, sizeof(btree_clauses));
- foreach(lc, all_clauses)
+ foreach(lc, keyclauses)
{
- PartClause *cur = (PartClause *) lfirst(lc);
+ PartClause *pc = (PartClause *) lfirst(lc);
- if (!cur->valid_cache)
+ if (!pc->valid_cache)
{
Oid lefttype;
- get_op_opfamily_properties(cur->op->opno,
+ get_op_opfamily_properties(pc->opno,
partkey->partopfamily[i],
false,
- &cur->op_strategy,
+ &pc->op_strategy,
&lefttype,
- &cur->op_subtype);
- fmgr_info(get_opcode(cur->op->opno), &cur->op_func);
- cur->valid_cache = true;
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
}
/*
@@ -2857,15 +2861,15 @@ remove_redundant_clauses(PartitionKey partkey,
if (partkey->strategy == PARTITION_STRATEGY_HASH)
{
if (hash_clause == NULL)
- hash_clause = cur;
+ hash_clause = pc;
/* check if another clause would contradict the one we have */
else if (partition_cmp_args(partkey, i,
- cur, cur, hash_clause,
+ pc, pc, hash_clause,
&test_result))
{
if (!test_result)
{
- partclauses->constfalse = true;
+ partclauseinfo->constfalse = true;
return;
}
}
@@ -2876,7 +2880,7 @@ remove_redundant_clauses(PartitionKey partkey,
* partition-pruning with it.
*/
else
- newlist = lappend(newlist, cur);
+ newlist = lappend(newlist, pc);
/*
* The code below handles btree operators, so not relevant for
@@ -2893,10 +2897,10 @@ remove_redundant_clauses(PartitionKey partkey,
* operator strategy type s+1; it is NULL if we haven't yet found
* such a clause.
*/
- s = cur->op_strategy - 1;
+ s = pc->op_strategy - 1;
if (btree_clauses[s] == NULL)
{
- btree_clauses[s] = cur;
+ btree_clauses[s] = pc;
}
else
{
@@ -2916,15 +2920,15 @@ remove_redundant_clauses(PartitionKey partkey,
* effectively discard a < 7 as being redundant.
*/
if (partition_cmp_args(partkey, i,
- cur, cur, btree_clauses[s],
+ pc, pc, btree_clauses[s],
&test_result))
{
/* cur is more restrictive, so replace the existing. */
if (test_result)
- btree_clauses[s] = cur;
+ btree_clauses[s] = pc;
else if (s == BTEqualStrategyNumber - 1)
{
- partclauses->constfalse = true;
+ partclauseinfo->constfalse = true;
return;
}
@@ -2937,7 +2941,7 @@ remove_redundant_clauses(PartitionKey partkey,
* the previous one in btree_clauses[s] and push this one directly
* to the output list.
*/
- newlist = lappend(newlist, cur);
+ newlist = lappend(newlist, pc);
}
}
}
@@ -2947,8 +2951,8 @@ remove_redundant_clauses(PartitionKey partkey,
/* Note we didn't add this one to the result yet. */
if (hash_clause)
newlist = lappend(newlist, hash_clause);
- list_free(partclauses->clauses[i]);
- partclauses->clauses[i] = newlist;
+ list_free(partclauseinfo->keyclauses[i]);
+ partclauseinfo->keyclauses[i] = newlist;
continue;
}
@@ -2979,7 +2983,7 @@ remove_redundant_clauses(PartitionKey partkey,
{
if (!test_result)
{
- partclauses->constfalse = true;
+ partclauseinfo->constfalse = true;
return;
}
/* Discard the no longer needed clause. */
@@ -3046,8 +3050,8 @@ remove_redundant_clauses(PartitionKey partkey,
* Replace the old List with the new one with the redundant clauses
* removed.
*/
- list_free(partclauses->clauses[i]);
- partclauses->clauses[i] = newlist;
+ list_free(partclauseinfo->keyclauses[i]);
+ partclauseinfo->keyclauses[i] = newlist;
}
}
@@ -3058,53 +3062,62 @@ remove_redundant_clauses(PartitionKey partkey,
* of this comparison.
*
* Returns true if we could actually perform the comparison; otherwise false.
- * We may not be able to perform the comparison if operand values are
- * unavailable and/or types of operands are incompatible with the operator.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
*/
static bool
partition_cmp_args(PartitionKey key, int partkeyidx,
- PartClause *op, PartClause *leftarg, PartClause *rightarg,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
bool *result)
{
- Oid partopfamily = key->partopfamily[partkeyidx];
- Datum leftarg_const,
- rightarg_const;
+ Datum left_value;
+ Datum right_value;
- Assert(op->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
- /* Get the constant values from the operands */
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
if (!partkey_datum_from_expr(key, partkeyidx,
- leftarg->constarg, &leftarg_const))
+ leftarg->value, &left_value))
return false;
+
if (!partkey_datum_from_expr(key, partkeyidx,
- rightarg->constarg, &rightarg_const))
+ rightarg->value, &right_value))
return false;
/*
- * We can compare leftarg_const and rightarg_const using op's operator
- * only if both are of the type expected by it.
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
*/
- if (leftarg->op_subtype == op->op_subtype &&
- rightarg->op_subtype == op->op_subtype)
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
{
- *result = DatumGetBool(FunctionCall2Coll(&op->op_func,
- op->op->inputcollid,
- leftarg_const,
- rightarg_const));
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
return true;
}
else
{
+ Oid partopfamily = key->partopfamily[partkeyidx];
+ Oid cmp_op;
+
/* Otherwise, look one up in the partitioning operator family. */
- Oid cmp_op = get_opfamily_member(partopfamily,
- leftarg->op_subtype,
- rightarg->op_subtype,
- op->op_strategy);
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
if (OidIsValid(cmp_op))
{
*result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
- op->op->inputcollid,
- leftarg_const,
- rightarg_const));
+ pc->inputcollid,
+ left_value,
+ right_value));
return true;
}
}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index ddbbc79..2393d26 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2126,6 +2126,25 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+static PartitionClauseInfo *
+_copyPartitionClauseInfo(const PartitionClauseInfo *from)
+{
+ PartitionClauseInfo *newnode = makeNode(PartitionClauseInfo);
+
+ int i;
+ for (i = 0; i < PARTITION_MAX_KEYS; i++)
+ COPY_NODE_FIELD(keyclauses[i]);
+
+ COPY_NODE_FIELD(or_clauses);
+ COPY_NODE_FIELD(ne_clauses);
+ COPY_BITMAPSET_FIELD(keyisnull);
+ COPY_BITMAPSET_FIELD(keyisnotnull);
+ COPY_SCALAR_FIELD(constfalse);
+ COPY_SCALAR_FIELD(foundkeyclauses);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5007,6 +5026,9 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionClauseInfo:
+ retval = _copyPartitionClauseInfo(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 51648c8..3821977 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -867,41 +867,65 @@ get_append_rel_partitions(PlannerInfo *root,
RelOptInfo *rel,
RangeTblEntry *rte)
{
- Relation partrel;
- Bitmapset *partindexes;
- List *result = NIL;
- int i;
+ List *result = NIL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
- partrel = heap_open(rte->relid, NoLock);
+ if (!clauses)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = lappend(result, rel->part_appinfos[i]);
+ return result;
+ }
+ else
+ {
+ Relation partrel;
+ Bitmapset *partindexes;
+ PartitionClauseInfo partclauseinfo;
- partindexes = get_partitions_from_clauses(partrel, rel->relid,
- rel->baserestrictinfo);
+ partrel = heap_open(rte->relid, NoLock);
- /* Fetch the partition appinfos. */
- i = -1;
- while ((i = bms_next_member(partindexes, i)) >= 0)
- {
- AppendRelInfo *appinfo = rel->part_appinfos[i];
+ /* Process clauses and populate partclauseinfo */
+ populate_partition_clause_info(partrel, rel->relid,
+ clauses, &partclauseinfo);
+
+ if (!partclauseinfo.constfalse)
+ {
+ PartitionPruneContext context;
+
+ context.rt_index = rel->relid;
+ context.relation = partrel;
+ context.clauseinfo = &partclauseinfo;
+
+ partindexes = get_partitions_from_clauseinfo(&context);
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
#ifdef USE_ASSERT_CHECKING
- PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
- RangeTblEntry *childrte;
+ PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
+ RangeTblEntry *childrte;
- childrte = planner_rt_fetch(appinfo->child_relid, root);
+ childrte = planner_rt_fetch(appinfo->child_relid, root);
- /*
- * Must be the intended child's RTE here, because appinfos are ordered
- * the same way as partitions in the partition descriptor.
- */
- Assert(partdesc->oids[i] == childrte->relid);
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == childrte->relid);
#endif
+ result = lappend(result, appinfo);
+ }
+ }
- result = lappend(result, appinfo);
- }
-
- heap_close(partrel, NoLock);
+ heap_close(partrel, NoLock);
- return result;
+ return result;
+ }
}
/*
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 8423c6e..01f5f0c 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -40,6 +40,13 @@ typedef struct PartitionDescData
PartitionBoundInfo boundinfo; /* collection of partition bounds */
} PartitionDescData;
+typedef struct PartitionPruneContext
+{
+ int rt_index;
+ Relation relation;
+ PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
typedef struct PartitionDescData *PartitionDesc;
extern void RelationBuildPartitionDesc(Relation relation);
@@ -74,6 +81,10 @@ extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
/* For partition-pruning */
-extern Bitmapset *get_partitions_from_clauses(Relation relation, int rt_index,
- List *partclauses);
+extern void populate_partition_clause_info(Relation relation,
+ int rt_index, List *clauses,
+ PartitionClauseInfo *partclauseinfo);
+extern Bitmapset *get_partitions_from_clauseinfo(
+ PartitionPruneContext *context);
+
#endif /* PARTITION_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 2eb3d6d..7630f25 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -190,6 +190,7 @@ typedef enum NodeTag
T_JoinExpr,
T_FromExpr,
T_OnConflictExpr,
+ T_PartitionClauseInfo,
T_IntoClause,
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d7..2a8cc40 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,29 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionClauseInfo
+ *
+ * Stores clauses which were matched to a partition key. Each matching clause
+ * is stored in the 'clauses' list for the partition key index that it was
+ * matched to. Other details are also stored, such as OR clauses and
+ * not-equal (<>) clauses. Nullness properties are also stored.
+ *----------
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ List *or_clauses; /* List of clauses found in an OR branch */
+ List *ne_clauses; /* Clauses in the form partkey <> Expr */
+
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* Stored data is known to contain impossible contradictions */
+ bool constfalse;
+ bool foundkeyclauses; /* true if clauses contains any items */
+} PartitionClauseInfo;
+
#endif /* PRIMNODES_H */
Hi David.
On 2018/01/23 15:44, David Rowley wrote:
On 19 January 2018 at 04:03, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 18 January 2018 at 23:56, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
So, I've been assuming that the planner changes in the run-time pruning
patch have to do with selecting clauses (restriction clauses not
containing Consts and/or join clauses) to be passed to the executor by
recording them in the Append node. Will they be selected by the planner
calling into partition.c?I had thought so. I only have a rough idea in my head and that's that
the PartitionPruneInfo struct that I wrote for the run-time pruning
patch would have the clause List replaced with this new
PartScanClauseInfo struct (which likely means it needs to go into
primnodes.h), this struct would contain all the partition pruning
clauses in a more structured form so that no matching of quals to the
partition key wouldn't be required during execution. The idea is that
we'd just need to call; remove_redundant_clauses,
extract_bounding_datums and get_partitions_for_keys. I think this is
the bare minimum of work that can be done in execution since we can't
remove the redundant clauses until we know the values of the Params.Likely this means there will need to be 2 functions externally
accessible for this in partition.c. I'm not sure exactly how best to
do this. Maybe it's fine just to have allpaths.c call
extract_partition_key_clauses to generate the PartScanClauseInfo then
call some version of get_partitions_from_clauses which does do the
extract_partition_key_clauses duties. This is made more complex by the
code that handles adding the default partition bound to the quals. I'm
not yet sure where that should live.I've also been thinking of having some sort of PartitionPruneContext
struct that we can pass around the functions. Runtime pruning had to
add structs which store the Param values to various functions which I
didn't like. It would be good to just add those to the context and
have them passed down without having to bloat the parameters in the
functions. I might try and do that tomorrow too. This should make the
footprint of the runtime pruning patch is a bit smaller.Attached is what I had in mind about how to do this.
Thanks for the delta patch. I will start looking at it tomorrow.
Regards,
Amit
On 23 January 2018 at 23:22, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/01/23 15:44, David Rowley wrote:
Attached is what I had in mind about how to do this.
Thanks for the delta patch. I will start looking at it tomorrow.
Thanks. I've been looking more at this and I've made a few more
adjustments in the attached.
This delta patch should be applied against the
faster_partition_prune_v21_delta_drowley_v1.patch one I sent
yesterday. This changes a few comments, also now correctly passes the
context to get_partitions_excluded_by_ne_clauses and fixes a small
error where the patch was failing to record a notnull for the
partition key when it saw a strict <> clause. It was only doing this
for the opposite case, but both seem to be perfectly applicable. I
also made a small adjustment to the regression tests to ensure this is
covered.
I'm now going to start work on basing the partition pruning patch on
top of this.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
faster_partition_prune_v21_delta_drowley_v1_delta.patchapplication/octet-stream; name=faster_partition_prune_v21_delta_drowley_v1_delta.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index ad789f6..b86dada 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -290,7 +290,8 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
-static Bitmapset *get_partitions_excluded_by_ne_clauses(Relation relation,
+static Bitmapset *get_partitions_excluded_by_ne_clauses(
+ PartitionPruneContext *context,
List *ne_clauses);
static Bitmapset *get_partitions_from_or_clause_args(
PartitionPruneContext *context,
@@ -1691,10 +1692,10 @@ get_partition_qual_relid(Oid relid)
/*
* populate_partition_clause_info
* Processes 'clauses' to try to match them to relation's partition
- * keys. If any clauses are found which match a partition key, then
- * these clauses are stored in 'partclauseinfo'.
+ * keys. If any compatible clauses are found which match a partition
+ * key, then these clauses are stored in 'partclauseinfo'.
*
- * The caller must ensure that 'clauses' is not an empty List. Upon return,
+ * The caller must ensure that 'clauses' is not an empty List. Upon return,
* callers must also check if the 'partclauseinfo' constfalse has been set, if
* so, then they must be aware that the 'partclauseinfo' may only be partially
* populated.
@@ -1747,9 +1748,9 @@ populate_partition_clause_info(Relation relation,
}
/*
- * get_partitions_from_clauses
- * Determine all partitions of the context 'relation' that could possibly
- * contain a record that matches the context 'clauseinfo'
+ * get_partitions_from_clauseinfo
+ * Determine all partitions of the context's 'relation' that could
+ * possibly contain a record that matches the context's 'clauseinfo'
*
* Returns a Bitmapset of the matching partition indexes, or NULL if none can
* match.
@@ -1823,7 +1824,7 @@ get_partitions_from_clauseinfo(PartitionPruneContext *context)
{
Bitmapset *ne_parts;
- ne_parts = get_partitions_excluded_by_ne_clauses(relation,
+ ne_parts = get_partitions_excluded_by_ne_clauses(context,
partclauseinfo->ne_clauses);
/* Remove any partitions we found to not be needed */
@@ -1867,11 +1868,13 @@ get_partitions_from_clauseinfo(PartitionPruneContext *context)
* possible values that the partition can contain.
*/
static Bitmapset *
-get_partitions_excluded_by_ne_clauses(Relation relation, List *ne_clauses)
+get_partitions_excluded_by_ne_clauses(PartitionPruneContext *context,
+ List *ne_clauses)
{
ListCell *lc;
- Bitmapset *excluded_parts = NULL;
+ Bitmapset *excluded_parts;
Bitmapset *foundoffsets = NULL;
+ Relation relation = context->relation;
PartitionKey partkey = RelationGetPartitionKey(relation);
PartitionDesc partdesc = RelationGetPartitionDesc(relation);
PartitionBoundInfo boundinfo = partdesc->boundinfo;
@@ -1922,10 +1925,10 @@ get_partitions_excluded_by_ne_clauses(Relation relation, List *ne_clauses)
* the entire partition.
*
* We'll need two arrays for this, one to count the number of unique
- * datums we found in the query, and another to record the number of
- * datums permitted in each partition. Once we've counted all this, we
- * can eliminate any partition where the number of datums found matches
- * the number of datums allowed in the partition.
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
*/
datums_in_part = (int *) palloc0(sizeof(int) * partdesc->nparts);
datums_found = (int *) palloc0(sizeof(int) * partdesc->nparts);
@@ -1947,6 +1950,8 @@ get_partitions_excluded_by_ne_clauses(Relation relation, List *ne_clauses)
* eliminate the default partition. We can recognize that by it having a
* zero value in both arrays.
*/
+ excluded_parts = NULL;
+
for (i = 0; i < partdesc->nparts; i++)
{
if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
@@ -2183,7 +2188,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
if (!op_strict(opclause->opno))
continue;
- /* Useless if the "constant" can change its value. */
+ /* We can't use any volatile value to prune partitions. */
if (contain_volatile_functions((Node *) valueexpr))
continue;
@@ -2191,7 +2196,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
* Handle cases where the clause's operator does not belong to
* the partitioning operator family. We currently handle two
* such cases: 1. Operators named '<>' are not listed in any
- * operator family whatsoever, 2. Ordering opertors like '<'
+ * operator family whatsoever, 2. Ordering operators like '<'
* are not listed in the hash operator families. For 1, check
* if list partitioning is in use and if so, proceed to pass
* the clause to the caller without doing any more processing
@@ -2200,10 +2205,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
*/
if (!op_in_opfamily(opclause->opno, partopfamily))
{
- int strategy;
- Oid negator,
- lefttype,
- righttype;
+ Oid negator;
/*
* To confirm if the operator is really '<>', check if its
@@ -2215,10 +2217,15 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
if (OidIsValid(negator) &&
op_in_opfamily(negator, partopfamily))
{
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
get_op_opfamily_properties(negator, partopfamily,
false,
&strategy,
&lefttype, &righttype);
+
if (strategy == BTEqualStrategyNumber &&
partkey->strategy == PARTITION_STRATEGY_LIST)
is_ne_listp = true;
@@ -2244,25 +2251,24 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
lappend(partclauseinfo->ne_clauses,
pc);
else
- {
partclauseinfo->keyclauses[i] =
lappend(partclauseinfo->keyclauses[i],
pc);
- partclauseinfo->foundkeyclauses = true;
- /*
- * Since we only allow strict operators, require keys to
- * be not null.
- */
- if (bms_is_member(i, partclauseinfo->keyisnull))
- {
- partclauseinfo->constfalse = true;
- return;
- }
- partclauseinfo->keyisnotnull =
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
bms_add_member(partclauseinfo->keyisnotnull,
i);
- }
+ partclauseinfo->foundkeyclauses = true;
}
else if (IsA(clause, ScalarArrayOpExpr))
{
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3821977..d4bf973 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -23,7 +23,6 @@
#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
-#include "catalog/pg_opfamily.h"
#include "catalog/pg_proc.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 2a8cc40..75828a4 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1510,7 +1510,7 @@ typedef struct OnConflictExpr
* PartitionClauseInfo
*
* Stores clauses which were matched to a partition key. Each matching clause
- * is stored in the 'clauses' list for the partition key index that it was
+ * is stored in the 'keyclauses' list for the partition key index that it was
* matched to. Other details are also stored, such as OR clauses and
* not-equal (<>) clauses. Nullness properties are also stored.
*----------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 855d51e..4caeaa7 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -343,9 +343,8 @@ typedef struct PlannerInfo
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
*
- * Since partitioning might be using a collation for a given partition key
- * column that is not same as the collation implied by column's type, store
- * the same separately.
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 83e6081..bc9ff38 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1366,38 +1366,30 @@ explain (costs off) select * from rp where a <> 1 and a <> 2;
Filter: ((a <> 1) AND (a <> 2))
(7 rows)
--- various cases for list partitioning where pruning should work
-explain (costs off) select * from lp where a <> 'a' and a is not null;
- QUERY PLAN
-----------------------------------------------------------
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
Append
-> Seq Scan on lp_ad
- Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ Filter: (a <> 'a'::bpchar)
-> Seq Scan on lp_bc
- Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ Filter: (a <> 'a'::bpchar)
-> Seq Scan on lp_ef
- Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ Filter: (a <> 'a'::bpchar)
-> Seq Scan on lp_g
- Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ Filter: (a <> 'a'::bpchar)
-> Seq Scan on lp_default
- Filter: ((a IS NOT NULL) AND (a <> 'a'::bpchar))
+ Filter: (a <> 'a'::bpchar)
(11 rows)
-explain (costs off) select * from lp where a <> 'a' and a <> 'a';
- QUERY PLAN
--------------------------------------------------------------
- Append
- -> Seq Scan on lp_ad
- Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
- -> Seq Scan on lp_bc
- Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
- -> Seq Scan on lp_ef
- Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
- -> Seq Scan on lp_g
- Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
- -> Seq Scan on lp_default
- Filter: ((a <> 'a'::bpchar) AND (a <> 'a'::bpchar))
-(11 rows)
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
QUERY PLAN
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 13b1207..b7c5abf 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -216,9 +216,12 @@ create table rp2 partition of rp for values from (2) to (maxvalue);
explain (costs off) select * from rp where a <> 1;
explain (costs off) select * from rp where a <> 1 and a <> 2;
--- various cases for list partitioning where pruning should work
-explain (costs off) select * from lp where a <> 'a' and a is not null;
-explain (costs off) select * from lp where a <> 'a' and a <> 'a';
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
-- case for list partitioned table that's not root
Hi David.
I looked at both of your faster_partition_prune_v21_delta_drowley_v1.patch
and faster_partition_prune_v21_delta_drowley_v1_delta.patch and have
incorporated them into the main patch series to get us the attached v22
set. I like the result very much. Thank you!
On 2018/01/23 15:44, David Rowley wrote:
Attached is what I had in mind about how to do this. Only the planner
will need to call populate_partition_clause_info. The planner and
executor can call get_partitions_from_clauseinfo. I'll just need to
change the run-time prune patch to pass the PartitionClauseInfo into
the executor in the Append node.
I liked the division into those two functions, although not quite the
embedding of "info" into the function names. I think it's good to just
call them populate_partition_clauses and get_partitions_from_clauses. It
seems OK though for their arguments to contain "info" in their names.
I've also added the context struct that I mentioned above. It's
currently not carrying much weight, but the idea is that this context
will be passed around a bit more with the run-time pruning patch and
it will also carry the details about parameter values. I'll need to
modify a few signatures of functions like partkey_datum_from_expr to
pass the context there too. I didn't do that here because currently,
those functions have no use for the context with the fields that they
currently have.
The context struct seems good to me too.
I've also fixed a bug where when you built the commutator OpExpr in
what's now called extract_partition_key_clauses the inputcollid was
not being properly set. The changes I made there go much further than
just that, I've completely removed the OpExpr from the PartClause
struct as only two members were ever used. I thought it was better
just to add those to PartClause instead.
I wondered if we should rename that to something like
PartClauseProperties, but maybe that's too long.
I also did some renaming of variables that always assumed that the
Expr being compared to the partition key was a constant. This is true
now, but with run-time pruning patch, it won't be, so I thought it was
better to do this here rather than having to rename them in the
run-time pruning patch.
OK, seems fine.
One thing I don't yet understand about the patch is the use of
predicate_refuted_by() in get_partitions_from_or_clause_args(). I did
adjust the comment above that code, but I'm still not certain I fully
understand why that function has to be used instead of building the
clauses for the OR being processed along with the remaining clauses.
Is it that this was too hard to extract that you ended up using
predicate_refuted_by()?
I have tried to explain that better in the updated comment in the new
patch, along with some code rearrangement to better make sense of what's
going on. Let me just copy-paste the new comment I wrote. I have tried
to rethink the solution a number of times but never came up with a
sensible alternative.
/*
* When matching an OR expression, it is only checked if at least one of
* its args matches the partition key, not all. For arguments that don't
* match, we cannot eliminate any of its partitions using
* get_partitions_from_clauses(). However, if the table is itself a
* partition, we may be able to prove using constraint exclusion that the
* clause refutes its partition constraint, that is, we can eliminate all
* of its partitions.
*/
foreach(lc, or_clause_args)
{
I've also removed the makeBoolExpr call that you were encapsulating
the or_clauses in. I didn't really see the need for this since you
just removed it again when looping over the or_clauses.
Ah, OK. I first became concerned that you said you were adding arguments
of different OR expressions into a single list and call it or_clauses, but
calmed down after checking that that's not the case. :)
The only other changes are just streamlining code and comment changes.
I made a few of those myself in the updated patches.
Thanks a lot again for your work on this.
Regards,
Amit
Attachments:
v22-0004-More-refactoring-around-partitioned-table-Append.patchtext/plain; charset=UTF-8; name=v22-0004-More-refactoring-around-partitioned-table-Append.patchDownload
From dedc751e10aaeecf5ab7ce822c9f79643c3e12b3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v22 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 120 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 19 ++++--
src/backend/optimizer/util/relnode.c | 14 ++++
src/include/nodes/relation.h | 25 ++++++-
4 files changed, 122 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd68374e20..8f761a77e8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1207,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1299,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1349,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ list_copy(childrel->live_partitioned_rels));
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 53870432ea..e159800063 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6202,14 +6202,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 35345ccbe9..f3b9a2be32 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,9 +575,12 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_part_appinfos = NIL;
+ joinrel->live_partitioned_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -734,9 +745,12 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_part_appinfos = NIL;
+ joinrel->live_partitioned_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6bf68f31da..25333c5407 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,8 +529,12 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * live_part_appinfos - AppendRelInfo of unpruned partitions
+ * live_partitioned_rels - RT indexes of unpruned partitions that are
+ * partitioned tables themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -657,10 +661,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
v22-0005-Teach-planner-to-use-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v22-0005-Teach-planner-to-use-get_partitions_from_clauses.patchDownload
From d23d49b849a8ae2a3ca301424d5733e67fc5cf78 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH v22 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using populate_partition_clauses().
Then, if we pass the clauses to get_partitions_from_clauses(), we'll
get the set of matching partitions in much less time than determining
by running the matching algorithm separately for each partition.
Authors: Amit Langote,
Dilip Kumar (dilipbalaut@gmail.com),
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/optimizer/path/allpaths.c | 80 ++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 41 ++-
src/include/nodes/relation.h | 6 +-
src/test/regress/expected/inherit.out | 8 +-
src/test/regress/expected/partition_prune.out | 434 ++++++++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 77 ++++-
7 files changed, 592 insertions(+), 78 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8f761a77e8..af9658128e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
@@ -136,6 +137,9 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
/*
@@ -847,6 +851,77 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Returns a List of AppendRelInfo belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *result = NIL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (!clauses)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = lappend(result, rel->part_appinfos[i]);
+ }
+ else
+ {
+ Relation partrel;
+ Bitmapset *partindexes;
+ PartitionClauseInfo partclauseinfo;
+
+ partrel = heap_open(rte->relid, NoLock);
+
+ /* Process clauses and populate partclauseinfo */
+ populate_partition_clauses(partrel, rel->relid,
+ clauses, &partclauseinfo);
+
+ if (!partclauseinfo.constfalse)
+ {
+ PartitionPruneContext context;
+
+ context.rt_index = rel->relid;
+ context.relation = partrel;
+ context.clauseinfo = &partclauseinfo;
+
+ partindexes = get_partitions_from_clauses(&context);
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
+ RangeTblEntry *childrte;
+
+ childrte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == childrte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ }
+
+ heap_close(partrel, NoLock);
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +963,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a35d068911..6949886e46 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1395,6 +1395,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..c1d4c7db5b 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1930,6 +1939,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 25333c5407..5e1d4151c2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +352,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..2072766efd 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,9 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-> Seq Scan on part_null_xy
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(5 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..bc9ff38253 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1067,355 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..b7c5abf378 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,79 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
v22-0001-Some-interface-changes-for-partition_bound_-cmp-.patchtext/plain; charset=UTF-8; name=v22-0001-Some-interface-changes-for-partition_bound_-cmp-.patchDownload
From 0ff4bad7f9faf5c4eaefad4c867961289c0631e4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH v22 1/5] Some interface changes for
partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 166 +++++++++++++++++++++++++++++-----------
1 file changed, 123 insertions(+), 43 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8adc4ee977..1edbf66eae 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (caller should set is_bound to true and set bound), or a new tuple's
+ * partition key specified in datums (caller should set ndatums to the number
+ * of valid datums that are passed in the array).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -985,6 +1011,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -999,8 +1027,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1073,10 +1107,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1127,6 +1167,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1148,8 +1189,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1163,9 +1207,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2537,12 +2581,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2569,11 +2616,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2845,12 +2896,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2872,11 +2923,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2885,25 +2936,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If there's no tuple datum to compare with the bound,
+ * consider the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2914,12 +2995,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2933,20 +3015,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg may contain either a partition bound struct or a Datum array
+ * representing the partition key of a tuple being routed. We simply pass
+ * that down to partition_bound_cmp where it is interpreted appropriately.
*
- * *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *is_equal is set to whether the bound at the returned index is exactly
+ * equal to *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2959,8 +3040,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
v22-0002-Introduce-a-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v22-0002-Introduce-a-get_partitions_from_clauses.patchDownload
From 28d4e6d60901e5bed7e970ce286a8645fa73506a Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v22 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of clauses and return
a set of indexes of the partitions that satisfy all of those clauses.
Aforementioned list of clauses must be all clauses that were matched
to the partition key(s) using populate_partition_clauses()
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/catalog/partition.c | 2096 ++++++++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 22 +
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 13 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/primnodes.h | 31 +
src/include/optimizer/clauses.h | 2 +
8 files changed, 2169 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1edbf66eae..22a4b743d6 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -163,6 +167,81 @@ typedef struct PartitionBoundCmpArg
int ndatums;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Properties found are cached and are indexed by the
+ * partition key index.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses found for the corresponding partition
+ * are inclusive of the stored value or not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +290,35 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_excluded_by_ne_clauses(
+ PartitionPruneContext *context,
+ List *ne_clauses);
+static Bitmapset *get_partitions_from_or_clause_args(
+ PartitionPruneContext *context,
+ List *or_clause_args);
+static void extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index, PartitionClauseInfo *partclauses);
+static bool extract_bounding_datums(PartitionKey partkey,
+ PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static bool partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *pc,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value);
+static void remove_redundant_clauses(PartitionKey partkey,
+ PartitionPruneContext *context);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1581,9 +1689,1997 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * populate_partition_clauses
+ * Processes 'clauses' to try to match them to relation's partition
+ * keys. If any compatible clauses are found which match a partition
+ * key, then these clauses are stored in 'partclauseinfo'.
+ *
+ * The caller must ensure that 'clauses' is not an empty List. Upon return,
+ * callers must also check if the 'partclauseinfo' constfalse has been set, if
+ * so, then they must be aware that the 'partclauseinfo' may only be partially
+ * populated.
+ */
+void
+populate_partition_clauses(Relation relation,
+ int rt_index, List *clauses,
+ PartitionClauseInfo *partclauseinfo)
+{
+ PartitionDesc partdesc;
+ PartitionKey partkey;
+ PartitionBoundInfo boundinfo;
+
+ Assert(clauses != NIL);
+
+ partkey = RelationGetPartitionKey(relation);
+ partdesc = RelationGetPartitionDesc(relation);
+
+ /* Some functions called below modify this list */
+ clauses = list_copy(clauses);
+ boundinfo = partdesc->boundinfo;
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement
+ * is perhaps unlikely for non-default partitions, but it may be more
+ * likely in the case of default partitions, so we'll add the parent
+ * partition table's partition qual to the clause list in this case only.
+ * This may result in the default partition being eliminated.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ List *partqual = RelationGetPartitionQual(relation);
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partqual, 1, rt_index, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ extract_partition_key_clauses(partkey, clauses, rt_index, partclauseinfo);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine all partitions of context->relation that could possibly
+ * contain a record that matches clauses as described in
+ * context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartitionDesc partdesc;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+
+ /*
+ * Check if there were proofs that no partitions can match due to some
+ * clause items contradicting another.
+ */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ partdesc = RelationGetPartitionDesc(context->relation);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+ else
+ {
+ PartitionKey partkey = RelationGetPartitionKey(context->relation);
+
+ /* collapse clauses down to the most restrictive set */
+ remove_redundant_clauses(partkey, context);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(partkey, context, &keys))
+ {
+ result = get_partitions_for_keys(context->relation, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have the values we'd need to eliminate
+ * partitions using get_partitions_for_keys, likely because
+ * context->clauseinfo only contained <> clauses and/or OR
+ * clauses, which are handled further below in this function.
+ */
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+ }
+
+ /* Select partitions by applying the clauses containing <> operators. */
+ if (partclauseinfo->ne_clauses)
+ {
+ Bitmapset *ne_parts;
+
+ ne_parts = get_partitions_excluded_by_ne_clauses(context,
+ partclauseinfo->ne_clauses);
+
+ /* Remove any partitions we found to not be needed */
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+
+ /* Select partitions by applying OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ PartitionPruneContext orcontext;
+ Bitmapset *or_parts;
+
+ orcontext.rt_index = context->rt_index;
+ orcontext.relation = context->relation;
+ orcontext.clauseinfo = NULL;
+
+ or_parts = get_partitions_from_or_clause_args(&orcontext, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_excluded_by_ne_clauses
+ *
+ * Returns a Bitmapset of partition indexes of any partition that can safely
+ * be removed due to 'ne_clauses' containing not-equal clauses for all
+ * possible values that the partition can contain.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_clauses(PartitionPruneContext *context,
+ List *ne_clauses)
+{
+ ListCell *lc;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+ Relation relation = context->relation;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int *datums_in_part;
+ int *datums_found;
+ int i;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partkey->partnatts == 1);
+
+ memset(&arg, 0, sizeof(arg));
+
+ /*
+ * Build a Bitmapset to record the indexes of all datums of the
+ * query that are found in boundinfo.
+ */
+ foreach(lc, ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(partkey, 0, pc->value, &datum))
+ {
+ int offset;
+ bool is_equal;
+
+ arg.datums = &datum;
+ arg.ndatums = 1;
+ offset = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * partdesc->nparts);
+ datums_found = (int *) palloc0(sizeof(int) * partdesc->nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < partdesc->nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
+ * get_partitions_from_or_clause_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_clause_args.
+ */
+static Bitmapset *
+get_partitions_from_or_clause_args(PartitionPruneContext *context,
+ List *or_clause_args)
+{
+ PartitionKey partkey = RelationGetPartitionKey(context->relation);
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_clause_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionClauseInfo partclauseinfo;
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ extract_partition_key_clauses(partkey, clauses, context->rt_index,
+ &partclauseinfo);
+
+ if (!partclauseinfo.foundkeyclauses)
+ {
+ List *partconstr = RelationGetPartitionQual(context->relation);
+ PartitionDesc partdesc;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->rt_index,
+ 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ partdesc = RelationGetPartitionDesc(context->relation);
+ arg_partset = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ subcontext.rt_index = context->rt_index;
+ subcontext.relation = context->relation;
+ subcontext.clauseinfo = &partclauseinfo;
+ arg_partset = get_partitions_from_clauses(&subcontext);
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ? \
+ (IsA((expr), Var) && \
+ ((Var *) (expr))->varattno == (partattno)) : \
+ equal((expr), (partexpr)))
+
+#define COLLATION_MATCH(partcoll, exprcoll) \
+ (!OidIsValid(partcoll) || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_key_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in 'partclauseinfo'. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * partclauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the PartitionClauseInfo is fully populated with all clauses.
+ */
+static void
+extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index,
+ PartitionClauseInfo *partclauseinfo)
+{
+ int i;
+ ListCell *lc;
+
+ memset(partclauseinfo, 0, sizeof(PartitionClauseInfo));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ ListCell *partexprs_item;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ PartClause *pc;
+ Oid partopfamily = partkey->partopfamily[i];
+ Oid partcoll = partkey->partcollation[i];
+ Oid commutator = InvalidOid;
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+
+ /*
+ * A zero attno means the partition key is an expression, so grab
+ * the next expression from the list.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ partexpr = (Expr *) lfirst(partexprs_item);
+
+ /*
+ * Expressions stored for the PartitionKey in the relcache are
+ * all stored with the dummy varno of 1. Change that to what
+ * we need.
+ */
+ if (rt_index != 1)
+ {
+ /* make a copy so as not to overwrite the relcache */
+ partexpr = (Expr *) copyObject(partexpr);
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+ }
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ valueexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ continue;
+ }
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ continue;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering operators like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber &&
+ partkey->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_opfuncid = saop->opfuncid;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+ bool negated = false;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ continue;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle it if its negator is indeed a part of the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy == BTEqualStrategyNumber)
+ negated = true;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ if (!elem_nulls[i])
+ elem_exprs = lappend(elem_exprs,
+ makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen, elem_values[i],
+ false, elembyval));
+ else
+ elem_exprs = lappend(elem_exprs,
+ makeNullConst(ARR_ELEMTYPE(arrval),
+ -1,
+ arr->constcollid));
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1);
+ Expr *elem_clause;
+
+ if (IsA(rightop, Const) && ((Const *) rightop)->constisnull)
+ {
+ NullTest *nulltest = makeNode(NullTest);
+
+ nulltest->arg = (Expr *) leftop;
+ nulltest->nulltesttype = !negated ? IS_NULL
+ : IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ elem_clause = (Expr *) nulltest;
+ }
+ else
+ {
+ OpExpr *opexpr = makeNode(OpExpr);
+
+ opexpr->opno = saop_op;
+ opexpr->opfuncid = saop_opfuncid;
+ opexpr->opresulttype = BOOLOID;
+ opexpr->opretset = false;
+ opexpr->opcollid = InvalidOid;
+ opexpr->inputcollid = saop_coll;
+ opexpr->args = list_make2(leftop, rightop);
+ opexpr->location = -1;
+ elem_clause = (Expr *) opexpr;
+ }
+
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (EXPR_MATCHES_PARTKEY(arg, partattno, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+ /*
+ * Boolean clauses have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ continue;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+ }
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal values that we're able to determine.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionKey partkey, PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ clauseinfo = context->clauseinfo;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ List *clauselist = clauseinfo->keyclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(PartitionKey key, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partkeyidx])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partkeyidx], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets a field
+ * in context->clauseinfo to inform the caller that we found such clause.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ PartitionPruneContext *context)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+ List *newlist;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ hash_clause = NULL;
+ newlist = NIL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ partkey->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, i,
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ newlist = lappend(newlist, pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, i,
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ newlist = lappend(newlist, pc);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ newlist = lappend(newlist, hash_clause);
+ list_free(partclauseinfo->keyclauses[i]);
+ partclauseinfo->keyclauses[i] = newlist;
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, i,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ newlist = lappend(newlist, btree_clauses[s]);
+ }
+
+ /*
+ * Replace the old List with the new one with the redundant clauses
+ * removed.
+ */
+ list_free(partclauseinfo->keyclauses[i]);
+ partclauseinfo->keyclauses[i] = newlist;
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid partopfamily = key->partopfamily[partkeyidx];
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions of 'rel' that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selceted partitions
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ return NULL; /* keep compiler quiet */
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+ Assert(partkey->partnatts == 1);
+
+ /*
+ * If the query is looking for null keys, there can only be one such
+ * partition. Return the same if one exists.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (minoff >= 0)
+ {
+ /*
+ * The bound at minoff is <= minkeys, given the way
+ * partition_bound_bsearch() works. If it's not equal (<), then
+ * increment minoff to make it point to the datum on the right
+ * that necessarily satisfies minkeys. Also do the same if it is
+ * equal but minkeys is exclusive.
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * minkeys is greater than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (maxoff >= 0)
+ {
+ /*
+ * The bound at maxoff is <= maxkeys, given the way
+ * partition_bound_bsearch works. If the bound at maxoff exactly
+ * matches maxkey (is_equal), but the maxkey is exclusive, then
+ * decrement maxoff to point to the bound on the left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal,
+ include_def = false;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partkey->partnatts);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering it as the lower bound
+ * of the partition that eqkeys falls into, the bound at eqoff + 1
+ * would be its upper bound, so use eqoff + 1 to get the desired
+ * partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_bound_bsearch would've returned the offset of just one of
+ * those. If minkey is inclusive, we must decrement minoff until it
+ * reaches the leftmost of those bound values, so that partitions
+ * corresponding to all those bound values are selected. If minkeys
+ * is exclusive, we must increment minoff until it reaches the first
+ * bound greater than this prefix, so that none of the partitions
+ * corresponding to those bound values are selected.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ keys->min_incl
+ ? minoff - 1 : minoff + 1,
+ &arg);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff += 1;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ keys->max_incl
+ ? maxoff + 1 : maxoff - 1,
+ &arg);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, we believe that minoff/maxoff point to the upper bound
+ * of some partition, but it may not be the case. It might actually be
+ * the upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range us unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e5d2de5330..b2f5f564b0 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2127,6 +2127,25 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+static PartitionClauseInfo *
+_copyPartitionClauseInfo(const PartitionClauseInfo *from)
+{
+ PartitionClauseInfo *newnode = makeNode(PartitionClauseInfo);
+
+ int i;
+ for (i = 0; i < PARTITION_MAX_KEYS; i++)
+ COPY_NODE_FIELD(keyclauses[i]);
+
+ COPY_NODE_FIELD(or_clauses);
+ COPY_NODE_FIELD(ne_clauses);
+ COPY_BITMAPSET_FIELD(keyisnull);
+ COPY_BITMAPSET_FIELD(keyisnotnull);
+ COPY_SCALAR_FIELD(constfalse);
+ COPY_SCALAR_FIELD(foundkeyclauses);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5010,6 +5029,9 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionClauseInfo:
+ retval = _copyPartitionClauseInfo(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..78d43ea07c 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -40,6 +40,13 @@ typedef struct PartitionDescData
PartitionBoundInfo boundinfo; /* collection of partition bounds */
} PartitionDescData;
+typedef struct PartitionPruneContext
+{
+ int rt_index;
+ Relation relation;
+ PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
typedef struct PartitionDescData *PartitionDesc;
extern void RelationBuildPartitionDesc(Relation relation);
@@ -73,4 +80,10 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern void populate_partition_clauses(Relation relation,
+ int rt_index, List *clauses,
+ PartitionClauseInfo *partclauseinfo);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..0ac242aeda 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -190,6 +190,7 @@ typedef enum NodeTag
T_JoinExpr,
T_FromExpr,
T_OnConflictExpr,
+ T_PartitionClauseInfo,
T_IntoClause,
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..642ea0fbde 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,35 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionClauseInfo
+ *
+ * Stores clauses which were matched to a partition key. Each matching clause
+ * is stored in the 'keyclauses' list for the partition key index that it was
+ * matched to. Other details are also stored, such as OR clauses and
+ * not-equal (<>) clauses. Nullness properties are also stored.
+ *----------
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
v22-0003-Move-some-code-of-set_append_rel_size-to-separat.patchtext/plain; charset=UTF-8; name=v22-0003-Move-some-code-of-set_append_rel_size-to-separat.patchDownload
From c5731f9e27d88b50f214cc6d34bcd4596ca022b6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH v22 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd1a58336b..fd68374e20 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ac5a7c9553..35345ccbe9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index ef7173fbf8..142eecd733 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -301,5 +301,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
Hi Amit,
On 01/26/2018 04:17 AM, Amit Langote wrote:
I made a few of those myself in the updated patches.
Thanks a lot again for your work on this.
This needs a rebase.
Best regards,
Jesper
Hi Jesper.
On 2018/01/29 22:13, Jesper Pedersen wrote:
Hi Amit,
On 01/26/2018 04:17 AM, Amit Langote wrote:
I made a few of those myself in the updated patches.
Thanks a lot again for your work on this.
This needs a rebase.
AFAICS, v22 cleanly applies to HEAD (c12693d8f3 [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c12693d8f3), compiles, and make
check passes.
Thanks,
Amit
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c12693d8f3
Hello, let me make some comments.
At Tue, 30 Jan 2018 09:57:44 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <4a7dda08-b883-6e5e-b0bf-f5ce95584e9e@lab.ntt.co.jp>
Hi Jesper.
On 2018/01/29 22:13, Jesper Pedersen wrote:
Hi Amit,
On 01/26/2018 04:17 AM, Amit Langote wrote:
I made a few of those myself in the updated patches.
Thanks a lot again for your work on this.
This needs a rebase.
AFAICS, v22 cleanly applies to HEAD (c12693d8f3 [1]), compiles, and make
check passes.
Yes, it cleanly applies to HEAD and seems working.
0001 seems fine.
I have some random comments on 0002.
-extract_partition_key_clauses implicitly assumes that the
commutator of inequality operator is the same to the original
operator. (commutation for such operators is an identity
function.)
I believe it is always true for a sane operator definition, but
perhaps wouldn't it be better using commutator instead of
opclause->opno as the source of negator if any? (Attached diff 1)
-get_partitions_from_or_clause_args abandons arg_partset with all
bit set when partition constraint doesn't refute whole the
partition. Finally get_partitions_from_clauses does the same
thing but it's waste of cycles and looks weird. It seems to have
intended to return immediately there.
/* Couldn't eliminate any of the partitions. */ partdesc = RelationGetPartitionDesc(context->relation); - arg_partset = bms_add_range(NULL, 0, partdesc->nparts - 1); + return bms_add_range(NULL, 0, partdesc->nparts - 1); }subcontext.rt_index = context->rt_index;
subcontext.relation = context->relation;
subcontext.clauseinfo = &partclauseinfo;
!> arg_partset = get_partitions_from_clauses(&subcontext);
-get_partitions_from_or_clause_args converts IN (null) into
nulltest and the nulltest doesn't exclude a child that the
partition key column can be null.
drop table if exists p;
create table p (a int, b int) partition by list (a);
create table c1 partition of p for values in (1, 5, 7);
create table c2 partition of p for values in (4, 6, null);
insert into p values (1, 0), (null, 0);
explain select tableoid::regclass, * from p where a in (1, null);
QUERY PLAN
-----------------------------------------------------------------
Result (cost=0.00..76.72 rows=22 width=12)
-> Append (cost=0.00..76.50 rows=22 width=12)
-> Seq Scan on c1 (cost=0.00..38.25 rows=11 width=12)
Filter: (a = ANY ('{1,NULL}'::integer[]))
-> Seq Scan on c2 (cost=0.00..38.25 rows=11 width=12)
Filter: (a = ANY ('{1,NULL}'::integer[]))
Although the clause "a in (null)" doesn't match the (null, 0)
row so it donesn't harm finally, I don't think this is a right
behavior. null in an SAOP rightop should be just ignored on
partition pruning. Or is there any purpose of this behavior?
- In extract_bounding_datums, clauseinfo is set twice to the same
value.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
v22-0002_diff1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index ab17524..a2488ab 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -2111,7 +2111,6 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
PartClause *pc;
Oid partopfamily = partkey->partopfamily[i];
Oid partcoll = partkey->partcollation[i];
- Oid commutator = InvalidOid;
AttrNumber partattno = partkey->partattrs[i];
Expr *partexpr = NULL;
@@ -2144,6 +2143,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
if (IsA(clause, OpExpr))
{
OpExpr *opclause = (OpExpr *) clause;
+ Oid comparator = opclause->opno;
Expr *leftop,
*rightop,
*valueexpr;
@@ -2161,13 +2161,14 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
valueexpr = rightop;
else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
{
- valueexpr = leftop;
-
- commutator = get_commutator(opclause->opno);
+ Oid commutator = get_commutator(opclause->opno);
/* nothing we can do unless we can swap the operands */
if (!OidIsValid(commutator))
continue;
+
+ valueexpr = leftop;
+ comparator = commutator;
}
else
/* Clause does not match this partition key. */
@@ -2212,7 +2213,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
* equality operator *and* this is a list partitioned
* table, we can use it prune partitions.
*/
- negator = get_negator(opclause->opno);
+ negator = get_negator(comparator);
if (OidIsValid(negator) &&
op_in_opfamily(negator, partopfamily))
{
@@ -2236,7 +2237,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
}
pc = (PartClause *) palloc0(sizeof(PartClause));
- pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->opno = comparator;
pc->inputcollid = opclause->inputcollid;
pc->value = valueexpr;
Hi Amit,
On 01/29/2018 07:57 PM, Amit Langote wrote:
This needs a rebase.
AFAICS, v22 cleanly applies to HEAD (c12693d8f3 [1]), compiles, and make
check passes.
It was a rebase error; I should have checked against a clean master.
Sorry for the noise.
Best regards,
Jesper
Horiguchi-san,
Thanks for the review.
On 2018/01/30 20:43, Kyotaro HORIGUCHI wrote:
At Tue, 30 Jan 2018 09:57:44 +0900, Amit Langote wrote:
AFAICS, v22 cleanly applies to HEAD (c12693d8f3 [1]), compiles, and make
I have some random comments on 0002.
-extract_partition_key_clauses implicitly assumes that the
commutator of inequality operator is the same to the original
operator. (commutation for such operators is an identity
function.)
Yeah, it seems so.
I believe it is always true for a sane operator definition, but
perhaps wouldn't it be better using commutator instead of
opclause->opno as the source of negator if any? (Attached diff 1)
ISTM, the same thing happens with or without the patch.
- pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->opno = comparator;
comparator as added by the patch is effectively equal to the RHS
expression in the deleted line.
-get_partitions_from_or_clause_args abandons arg_partset with all
bit set when partition constraint doesn't refute whole the
partition. Finally get_partitions_from_clauses does the same
thing but it's waste of cycles and looks weird. It seems to have
intended to return immediately there./* Couldn't eliminate any of the partitions. */ partdesc = RelationGetPartitionDesc(context->relation); - arg_partset = bms_add_range(NULL, 0, partdesc->nparts - 1); + return bms_add_range(NULL, 0, partdesc->nparts - 1); }subcontext.rt_index = context->rt_index;
subcontext.relation = context->relation;
subcontext.clauseinfo = &partclauseinfo;!> arg_partset = get_partitions_from_clauses(&subcontext);
You're right, fixed.
-get_partitions_from_or_clause_args converts IN (null) into
nulltest and the nulltest doesn't exclude a child that the
partition key column can be null.drop table if exists p;
create table p (a int, b int) partition by list (a);
create table c1 partition of p for values in (1, 5, 7);
create table c2 partition of p for values in (4, 6, null);
insert into p values (1, 0), (null, 0);explain select tableoid::regclass, * from p where a in (1, null);
QUERY PLAN
-----------------------------------------------------------------
Result (cost=0.00..76.72 rows=22 width=12)
-> Append (cost=0.00..76.50 rows=22 width=12)
-> Seq Scan on c1 (cost=0.00..38.25 rows=11 width=12)
Filter: (a = ANY ('{1,NULL}'::integer[]))
-> Seq Scan on c2 (cost=0.00..38.25 rows=11 width=12)
Filter: (a = ANY ('{1,NULL}'::integer[]))Although the clause "a in (null)" doesn't match the (null, 0)
row so it donesn't harm finally, I don't think this is a right
behavior. null in an SAOP rightop should be just ignored on
partition pruning. Or is there any purpose of this behavior?
Yeah, it seems that we're better off ignoring null values appearing the
IN-list. Framing a IS NULL or IS NOT NULL expression corresponding to a
null value in the SAOP rightop array doesn't seem to be semantically
correct, as you also pointed out. In ExecEvalScalarArrayOpExpr(), I see
that a null in the rightop array (IN-list) doesn't lead to selecting rows
containing null in the corresponding column.
- In extract_bounding_datums, clauseinfo is set twice to the same
value.
Oops, my bad when merging in David's patch.
Update patch set attached. Thanks again.
Regards,
Amit
Attachments:
v23-0001-Some-interface-changes-for-partition_bound_-cmp-.patchtext/plain; charset=UTF-8; name=v23-0001-Some-interface-changes-for-partition_bound_-cmp-.patchDownload
From 42ca7750e2b89caa3aee0c9ab7479a7f29953fff Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH v23 1/5] Some interface changes for
partition_bound_{cmp/bsearch}
Introduces a notion of PartitionBoundCmpArg, which replaces the set
of arguments void *probe and bool probe_is_bound of both
partition_bound_cmp and partition_bound_bsearch. It wasn't possible
before to specify the number of datums when a non-bound type of
probe is passed. Slightly tweaking the existing interface to allow
specifying the same seems awkward. So, instead encapsulate that
into PartitionBoundCmpArg. Also, modify partition_rbound_datum_cmp
to compare caller-specifed number of datums, instead of
key->partnatts datums.
---
src/backend/catalog/partition.c | 166 +++++++++++++++++++++++++++++-----------
1 file changed, 123 insertions(+), 43 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e69bbc0345..de2b53e0c8 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,31 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (caller should set is_bound to true and set bound), or a new tuple's
+ * partition key specified in datums (caller should set ndatums to the number
+ * of valid datums that are passed in the array).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -170,14 +195,15 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
static int32 partition_bound_cmp(PartitionKey key,
PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
+ int offset, PartitionBoundCmpArg *arg);
static int partition_bound_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionBoundCmpArg *arg,
+ bool *is_equal);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -985,6 +1011,8 @@ check_new_partition_bound(char *relname, Relation parent,
valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
+ PartitionHashBound hbound;
+ PartitionBoundCmpArg arg;
/*
* Check rule that every modulus must be a factor of the
@@ -999,8 +1027,14 @@ check_new_partition_bound(char *relname, Relation parent,
* less than or equal to spec->modulus and
* spec->remainder.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ memset(&hbound, 0, sizeof(PartitionHashBound));
+ hbound.modulus = spec->modulus;
+ hbound.remainder = spec->remainder;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.hbound = &hbound;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1073,10 +1107,16 @@ check_new_partition_bound(char *relname, Relation parent,
{
int offset;
bool equal;
-
+ PartitionListValue lbound;
+ PartitionBoundCmpArg arg;
+
+ memset(&lbound, 0, sizeof(PartitionListValue));
+ lbound.value = val->constvalue;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.lbound = &lbound;
offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ &arg, &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1127,6 +1167,7 @@ check_new_partition_bound(char *relname, Relation parent,
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int offset;
bool equal;
+ PartitionBoundCmpArg arg;
Assert(boundinfo &&
boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
@@ -1148,8 +1189,11 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = true;
+ arg.bound.rbound = lower;
+ offset = partition_bound_bsearch(key, boundinfo, &arg,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1163,9 +1207,9 @@ check_new_partition_bound(char *relname, Relation parent,
{
int32 cmpval;
+ arg.bound.rbound = upper;
cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ offset + 1, &arg);
if (cmpval < 0)
{
/*
@@ -2537,12 +2581,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
else
{
bool equal = false;
+ PartitionBoundCmpArg arg;
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2569,11 +2616,15 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
+ PartitionBoundCmpArg arg;
+
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.is_bound = false;
+ arg.datums = values;
+ arg.ndatums = key->partnatts;
bound_offset = partition_bound_bsearch(key,
partdesc->boundinfo,
- values,
- false,
- &equal);
+ &arg, &equal);
/*
* The bound at bound_offset is less than or equal to the
@@ -2845,12 +2896,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2872,11 +2923,11 @@ partition_rbound_datum_cmp(PartitionKey key,
* partition_bound_cmp
*
* Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * specified in *arg.
*/
static int32
partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+ int offset, PartitionBoundCmpArg *arg)
{
Datum *bound_datums = boundinfo->datums[offset];
int32 cmpval = -1;
@@ -2885,25 +2936,55 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
{
case PARTITION_STRATEGY_HASH:
{
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int modulus,
+ remainder;
+
+ if (arg->is_bound)
+ {
+ modulus = arg->bound.hbound->modulus;
+ remainder = arg->bound.hbound->remainder;
+ }
+ else
+ {
+ modulus = DatumGetInt32(arg->datums[0]);
+ remainder = DatumGetInt32(arg->datums[1]);
+ }
cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ modulus, remainder);
break;
}
case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ {
+ Datum listdatum;
+
+ if (arg->is_bound)
+ listdatum = arg->bound.lbound->value;
+ else
+ {
+ if (arg->ndatums >= 1)
+ listdatum = arg->datums[0];
+ /*
+ * If there's no tuple datum to compare with the bound,
+ * consider the latter to be greater.
+ */
+ else
+ return 1;
+ }
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ bound_datums[0],
+ listdatum));
+ break;
+ }
case PARTITION_STRATEGY_RANGE:
{
PartitionRangeDatumKind *kind = boundinfo->kind[offset];
- if (probe_is_bound)
+ if (arg->is_bound)
{
/*
* We need to pass whether the existing bound is a lower
@@ -2914,12 +2995,13 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
cmpval = partition_rbound_cmp(key,
bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
+ arg->bound.rbound);
}
else
cmpval = partition_rbound_datum_cmp(key,
bound_datums, kind,
- (Datum *) probe);
+ arg->datums,
+ arg->ndatums);
break;
}
@@ -2933,20 +3015,19 @@ partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
/*
* Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
+ * bound in array boundinfo->datums which is less than or equal to *arg.
+ * If all bounds in the array are greater than *arg, -1 is returned.
*
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * *arg may contain either a partition bound struct or a Datum array
+ * representing the partition key of a tuple being routed. We simply pass
+ * that down to partition_bound_cmp where it is interpreted appropriately.
*
- * *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *is_equal is set to whether the bound at the returned index is exactly
+ * equal to *arg.
*/
static int
partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+ PartitionBoundCmpArg *arg, bool *is_equal)
{
int lo,
hi,
@@ -2959,8 +3040,7 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_bound_cmp(key, boundinfo, mid, arg);
if (cmpval <= 0)
{
lo = mid;
--
2.11.0
v23-0002-Introduce-a-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v23-0002-Introduce-a-get_partitions_from_clauses.patchDownload
From b20b3a3deca2f9469045b2ea89683cb34ad5cc15 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v23 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of clauses and return
a set of indexes of the partitions that satisfy all of those clauses.
Aforementioned list of clauses must be all clauses that were matched
to the partition key(s) using populate_partition_clauses()
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/catalog/partition.c | 2071 ++++++++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 22 +
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 13 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/primnodes.h | 31 +
src/include/optimizer/clauses.h | 2 +
8 files changed, 2144 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index de2b53e0c8..a5179d177e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -163,6 +167,81 @@ typedef struct PartitionBoundCmpArg
int ndatums;
} PartitionBoundCmpArg;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Properties found are cached and are indexed by the
+ * partition key index.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses found for the corresponding partition
+ * are inclusive of the stored value or not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -211,6 +290,35 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_excluded_by_ne_clauses(
+ PartitionPruneContext *context,
+ List *ne_clauses);
+static Bitmapset *get_partitions_from_or_clause_args(
+ PartitionPruneContext *context,
+ List *or_clause_args);
+static void extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index, PartitionClauseInfo *partclauses);
+static bool extract_bounding_datums(PartitionKey partkey,
+ PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static bool partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *pc,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value);
+static void remove_redundant_clauses(PartitionKey partkey,
+ PartitionPruneContext *context);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1581,9 +1689,1972 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * populate_partition_clauses
+ * Processes 'clauses' to try to match them to relation's partition
+ * keys. If any compatible clauses are found which match a partition
+ * key, then these clauses are stored in 'partclauseinfo'.
+ *
+ * The caller must ensure that 'clauses' is not an empty List. Upon return,
+ * callers must also check if the 'partclauseinfo' constfalse has been set, if
+ * so, then they must be aware that the 'partclauseinfo' may only be partially
+ * populated.
+ */
+void
+populate_partition_clauses(Relation relation,
+ int rt_index, List *clauses,
+ PartitionClauseInfo *partclauseinfo)
+{
+ PartitionDesc partdesc;
+ PartitionKey partkey;
+ PartitionBoundInfo boundinfo;
+
+ Assert(clauses != NIL);
+
+ partkey = RelationGetPartitionKey(relation);
+ partdesc = RelationGetPartitionDesc(relation);
+
+ /* Some functions called below modify this list */
+ clauses = list_copy(clauses);
+ boundinfo = partdesc->boundinfo;
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement
+ * is perhaps unlikely for non-default partitions, but it may be more
+ * likely in the case of default partitions, so we'll add the parent
+ * partition table's partition qual to the clause list in this case only.
+ * This may result in the default partition being eliminated.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ List *partqual = RelationGetPartitionQual(relation);
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partqual, 1, rt_index, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ extract_partition_key_clauses(partkey, clauses, rt_index, partclauseinfo);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine all partitions of context->relation that could possibly
+ * contain a record that matches clauses as described in
+ * context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartitionDesc partdesc;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+
+ /*
+ * Check if there were proofs that no partitions can match due to some
+ * clause items contradicting another.
+ */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ partdesc = RelationGetPartitionDesc(context->relation);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+ else
+ {
+ PartitionKey partkey = RelationGetPartitionKey(context->relation);
+
+ /* collapse clauses down to the most restrictive set */
+ remove_redundant_clauses(partkey, context);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(partkey, context, &keys))
+ {
+ result = get_partitions_for_keys(context->relation, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have the values we'd need to eliminate
+ * partitions using get_partitions_for_keys, likely because
+ * context->clauseinfo only contained <> clauses and/or OR
+ * clauses, which are handled further below in this function.
+ */
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+ }
+
+ /* Select partitions by applying the clauses containing <> operators. */
+ if (partclauseinfo->ne_clauses)
+ {
+ Bitmapset *ne_parts;
+
+ ne_parts = get_partitions_excluded_by_ne_clauses(context,
+ partclauseinfo->ne_clauses);
+
+ /* Remove any partitions we found to not be needed */
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+
+ /* Select partitions by applying OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ PartitionPruneContext orcontext;
+ Bitmapset *or_parts;
+
+ orcontext.rt_index = context->rt_index;
+ orcontext.relation = context->relation;
+ orcontext.clauseinfo = NULL;
+
+ or_parts = get_partitions_from_or_clause_args(&orcontext, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_excluded_by_ne_clauses
+ *
+ * Returns a Bitmapset of partition indexes of any partition that can safely
+ * be removed due to 'ne_clauses' containing not-equal clauses for all
+ * possible values that the partition can contain.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_clauses(PartitionPruneContext *context,
+ List *ne_clauses)
+{
+ ListCell *lc;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+ Relation relation = context->relation;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ PartitionBoundCmpArg arg;
+ int *datums_in_part;
+ int *datums_found;
+ int i;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partkey->partnatts == 1);
+
+ memset(&arg, 0, sizeof(arg));
+
+ /*
+ * Build a Bitmapset to record the indexes of all datums of the
+ * query that are found in boundinfo.
+ */
+ foreach(lc, ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(partkey, 0, pc->value, &datum))
+ {
+ int offset;
+ bool is_equal;
+
+ arg.datums = &datum;
+ arg.ndatums = 1;
+ offset = partition_bound_bsearch(partkey, boundinfo, &arg,
+ &is_equal);
+
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * partdesc->nparts);
+ datums_found = (int *) palloc0(sizeof(int) * partdesc->nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < partdesc->nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
+ * get_partitions_from_or_clause_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_clause_args.
+ */
+static Bitmapset *
+get_partitions_from_or_clause_args(PartitionPruneContext *context,
+ List *or_clause_args)
+{
+ PartitionKey partkey = RelationGetPartitionKey(context->relation);
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_clause_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionClauseInfo partclauseinfo;
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ extract_partition_key_clauses(partkey, clauses, context->rt_index,
+ &partclauseinfo);
+
+ if (!partclauseinfo.foundkeyclauses)
+ {
+ List *partconstr = RelationGetPartitionQual(context->relation);
+ PartitionDesc partdesc;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->rt_index,
+ 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ partdesc = RelationGetPartitionDesc(context->relation);
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ subcontext.rt_index = context->rt_index;
+ subcontext.relation = context->relation;
+ subcontext.clauseinfo = &partclauseinfo;
+ arg_partset = get_partitions_from_clauses(&subcontext);
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ? \
+ (IsA((expr), Var) && \
+ ((Var *) (expr))->varattno == (partattno)) : \
+ equal((expr), (partexpr)))
+
+#define COLLATION_MATCH(partcoll, exprcoll) \
+ (!OidIsValid(partcoll) || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_key_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in 'partclauseinfo'. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * partclauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the PartitionClauseInfo is fully populated with all clauses.
+ */
+static void
+extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index,
+ PartitionClauseInfo *partclauseinfo)
+{
+ int i;
+ ListCell *lc;
+
+ memset(partclauseinfo, 0, sizeof(PartitionClauseInfo));
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ ListCell *partexprs_item;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ PartClause *pc;
+ Oid partopfamily = partkey->partopfamily[i];
+ Oid partcoll = partkey->partcollation[i];
+ Oid commutator = InvalidOid;
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+
+ /*
+ * A zero attno means the partition key is an expression, so grab
+ * the next expression from the list.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ partexpr = (Expr *) lfirst(partexprs_item);
+
+ /*
+ * Expressions stored for the PartitionKey in the relcache are
+ * all stored with the dummy varno of 1. Change that to what
+ * we need.
+ */
+ if (rt_index != 1)
+ {
+ /* make a copy so as not to overwrite the relcache */
+ partexpr = (Expr *) copyObject(partexpr);
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+ }
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ valueexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ continue;
+ }
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ continue;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering operators like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber &&
+ partkey->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ continue;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle it if its negator is indeed a part of the
+ * partitioning equality operator.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ continue;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (EXPR_MATCHES_PARTKEY(arg, partattno, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+ /*
+ * Boolean clauses have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ continue;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+ }
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal values that we're able to determine.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionKey partkey, PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ List *clauselist = clauseinfo->keyclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(PartitionKey key, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partkeyidx])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partkeyidx], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets a field
+ * in context->clauseinfo to inform the caller that we found such clause.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ PartitionPruneContext *context)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+ List *newlist;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ hash_clause = NULL;
+ newlist = NIL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ partkey->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, i,
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ newlist = lappend(newlist, pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, i,
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ newlist = lappend(newlist, pc);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ newlist = lappend(newlist, hash_clause);
+ list_free(partclauseinfo->keyclauses[i]);
+ partclauseinfo->keyclauses[i] = newlist;
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, i,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ newlist = lappend(newlist, btree_clauses[s]);
+ }
+
+ /*
+ * Replace the old List with the new one with the redundant clauses
+ * removed.
+ */
+ list_free(partclauseinfo->keyclauses[i]);
+ partclauseinfo->keyclauses[i] = newlist;
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid partopfamily = key->partopfamily[partkeyidx];
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions of 'rel' that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selceted partitions
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ return NULL; /* keep compiler quiet */
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+ Assert(partkey->partnatts == 1);
+
+ /*
+ * If the query is looking for null keys, there can only be one such
+ * partition. Return the same if one exists.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (minoff >= 0)
+ {
+ /*
+ * The bound at minoff is <= minkeys, given the way
+ * partition_bound_bsearch() works. If it's not equal (<), then
+ * increment minoff to make it point to the datum on the right
+ * that necessarily satisfies minkeys. Also do the same if it is
+ * equal but minkeys is exclusive.
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * minkeys is greater than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ if (maxoff >= 0)
+ {
+ /*
+ * The bound at maxoff is <= maxkeys, given the way
+ * partition_bound_bsearch works. If the bound at maxoff exactly
+ * matches maxkey (is_equal), but the maxkey is exclusive, then
+ * decrement maxoff to point to the bound on the left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ PartitionBoundCmpArg arg;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal,
+ include_def = false;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partkey->partnatts);
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->eqkeys;
+ arg.ndatums = keys->n_eqkeys;
+ eqoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering it as the lower bound
+ * of the partition that eqkeys falls into, the bound at eqoff + 1
+ * would be its upper bound, so use eqoff + 1 to get the desired
+ * partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->minkeys;
+ arg.ndatums = keys->n_minkeys;
+ minoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_bound_bsearch would've returned the offset of just one of
+ * those. If minkey is inclusive, we must decrement minoff until it
+ * reaches the leftmost of those bound values, so that partitions
+ * corresponding to all those bound values are selected. If minkeys
+ * is exclusive, we must increment minoff until it reaches the first
+ * bound greater than this prefix, so that none of the partitions
+ * corresponding to those bound values are selected.
+ */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ keys->min_incl
+ ? minoff - 1 : minoff + 1,
+ &arg);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff += 1;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ memset(&arg, 0, sizeof(PartitionBoundCmpArg));
+ arg.datums = keys->maxkeys;
+ arg.ndatums = keys->n_maxkeys;
+ maxoff = partition_bound_bsearch(partkey, boundinfo, &arg, &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && arg.ndatums < partkey->partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_bound_cmp(partkey, boundinfo,
+ keys->max_incl
+ ? maxoff + 1 : maxoff - 1,
+ &arg);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, we believe that minoff/maxoff point to the upper bound
+ * of some partition, but it may not be the case. It might actually be
+ * the upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range us unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index fd3001c493..2fc54defbd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2127,6 +2127,25 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+static PartitionClauseInfo *
+_copyPartitionClauseInfo(const PartitionClauseInfo *from)
+{
+ PartitionClauseInfo *newnode = makeNode(PartitionClauseInfo);
+
+ int i;
+ for (i = 0; i < PARTITION_MAX_KEYS; i++)
+ COPY_NODE_FIELD(keyclauses[i]);
+
+ COPY_NODE_FIELD(or_clauses);
+ COPY_NODE_FIELD(ne_clauses);
+ COPY_BITMAPSET_FIELD(keyisnull);
+ COPY_BITMAPSET_FIELD(keyisnotnull);
+ COPY_SCALAR_FIELD(constfalse);
+ COPY_SCALAR_FIELD(foundkeyclauses);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5009,6 +5028,9 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionClauseInfo:
+ retval = _copyPartitionClauseInfo(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..78d43ea07c 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -40,6 +40,13 @@ typedef struct PartitionDescData
PartitionBoundInfo boundinfo; /* collection of partition bounds */
} PartitionDescData;
+typedef struct PartitionPruneContext
+{
+ int rt_index;
+ Relation relation;
+ PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
typedef struct PartitionDescData *PartitionDesc;
extern void RelationBuildPartitionDesc(Relation relation);
@@ -73,4 +80,10 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern void populate_partition_clauses(Relation relation,
+ int rt_index, List *clauses,
+ PartitionClauseInfo *partclauseinfo);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..0ac242aeda 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -190,6 +190,7 @@ typedef enum NodeTag
T_JoinExpr,
T_FromExpr,
T_OnConflictExpr,
+ T_PartitionClauseInfo,
T_IntoClause,
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..642ea0fbde 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,35 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionClauseInfo
+ *
+ * Stores clauses which were matched to a partition key. Each matching clause
+ * is stored in the 'keyclauses' list for the partition key index that it was
+ * matched to. Other details are also stored, such as OR clauses and
+ * not-equal (<>) clauses. Nullness properties are also stored.
+ *----------
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
v23-0003-Move-some-code-of-set_append_rel_size-to-separat.patchtext/plain; charset=UTF-8; name=v23-0003-Move-some-code-of-set_append_rel_size-to-separat.patchDownload
From 202812a09e1b7500d23a4a29edb8223e32d83555 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH v23 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd1a58336b..fd68374e20 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ac5a7c9553..35345ccbe9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index ef7173fbf8..142eecd733 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -301,5 +301,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
v23-0004-More-refactoring-around-partitioned-table-Append.patchtext/plain; charset=UTF-8; name=v23-0004-More-refactoring-around-partitioned-table-Append.patchDownload
From 68a16a921bc67728136b232c7a93af7bdf658478 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v23 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 120 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 19 ++++--
src/backend/optimizer/util/relnode.c | 14 ++++
src/include/nodes/relation.h | 25 ++++++-
4 files changed, 122 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd68374e20..8f761a77e8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1207,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1299,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1349,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ list_copy(childrel->live_partitioned_rels));
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2a4e22b6c8..a81fed6d1d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5839,14 +5839,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 35345ccbe9..f3b9a2be32 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,9 +575,12 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_part_appinfos = NIL;
+ joinrel->live_partitioned_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -734,9 +745,12 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_part_appinfos = NIL;
+ joinrel->live_partitioned_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6bf68f31da..25333c5407 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,8 +529,12 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * live_part_appinfos - AppendRelInfo of unpruned partitions
+ * live_partitioned_rels - RT indexes of unpruned partitions that are
+ * partitioned tables themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -657,10 +661,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
v23-0005-Teach-planner-to-use-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v23-0005-Teach-planner-to-use-get_partitions_from_clauses.patchDownload
From 200dffcf2360acd7854397cd3b0b5187a730366f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH v23 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using populate_partition_clauses().
Then, if we pass the clauses to get_partitions_from_clauses(), we'll
get the set of matching partitions in much less time than determining
by running the matching algorithm separately for each partition.
Authors: Amit Langote,
Dilip Kumar (dilipbalaut@gmail.com),
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/optimizer/path/allpaths.c | 80 ++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 41 ++-
src/include/nodes/relation.h | 6 +-
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 434 ++++++++++++++++++++++----
src/test/regress/sql/partition_prune.sql | 77 ++++-
7 files changed, 592 insertions(+), 80 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8f761a77e8..af9658128e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
@@ -136,6 +137,9 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
/*
@@ -847,6 +851,77 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Returns a List of AppendRelInfo belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *result = NIL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (!clauses)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = lappend(result, rel->part_appinfos[i]);
+ }
+ else
+ {
+ Relation partrel;
+ Bitmapset *partindexes;
+ PartitionClauseInfo partclauseinfo;
+
+ partrel = heap_open(rte->relid, NoLock);
+
+ /* Process clauses and populate partclauseinfo */
+ populate_partition_clauses(partrel, rel->relid,
+ clauses, &partclauseinfo);
+
+ if (!partclauseinfo.constfalse)
+ {
+ PartitionPruneContext context;
+
+ context.rt_index = rel->relid;
+ context.relation = partrel;
+ context.clauseinfo = &partclauseinfo;
+
+ partindexes = get_partitions_from_clauses(&context);
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
+ RangeTblEntry *childrte;
+
+ childrte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == childrte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ }
+
+ heap_close(partrel, NoLock);
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +963,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a35d068911..6949886e46 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1395,6 +1395,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..c1d4c7db5b 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1930,6 +1939,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 25333c5407..5e1d4151c2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +352,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index aabb0240a9..bc9ff38253 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,28 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_t
- Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
- -> Seq Scan on boolpart_f
- Filter: a
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1040,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,4 +1067,355 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..b7c5abf378 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,79 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
On 31 January 2018 at 21:03, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Update patch set attached. Thanks again.
(My apologies for being slow to respond here. I've been on leave this
week and I'm off again next week. I now have a little time to reply)
Hi Amit,
Thanks for incorporating my changes into the patchset. A while ago I
was rebasing the run-time pruning patch on top of this but ran into a
few problems which are all results of my changes.
1. remove_redundant_clauses collapses the PartClause list into the
most restrictive set of clauses. This disallows multiple evaluations
of the PartitionClauseInfo during runtime pruning. I've written a
proposed fix for this and attached it.
2. PartitionClauseInfo->keyclauses is a list of PartClause which is
not a node type. This will cause _copyPartitionClauseInfo() to fail.
I'm still not quite sure the best way to fix #2 since PartClause
contains a FmgrInfo. I do have a local fix which moves PartClause to
primnodes.h and makes it a proper node type. I also added a copy
function which does not copy any of the cache fields in PartClause. It
just sets valid_cache to false. I didn't particularly think this was
the correct fix. I just couldn't think of how exactly this should be
done at the time.
The attached patch also adds the missing nodetag from
PartitionClauseInfo and also fixes up other code so as we don't memset
the node memory to zero, as that would destroy the tag. I ended up
just having extract_partition_key_clauses do the makeNode call. This
also resulted in populate_partition_clauses being renamed to
generate_partition_clauses
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
PartitionClauseInfo_reevaluation.patchapplication/octet-stream; name=PartitionClauseInfo_reevaluation.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 20e8de9..5a44947 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -296,11 +296,12 @@ static Bitmapset *get_partitions_excluded_by_ne_clauses(
static Bitmapset *get_partitions_from_or_clause_args(
PartitionPruneContext *context,
List *or_clause_args);
-static void extract_partition_key_clauses(PartitionKey partkey, List *clauses,
- int rt_index, PartitionClauseInfo *partclauses);
+static PartitionClauseInfo *extract_partition_key_clauses(
+ PartitionKey partkey, List *clauses,
+ int rt_index);
static bool extract_bounding_datums(PartitionKey partkey,
PartitionPruneContext *context,
- PartScanKeyInfo *keys);
+ List **minimalclauses, PartScanKeyInfo *keys);
static bool partition_cmp_args(PartitionKey key, int partkeyidx,
PartClause *pc, PartClause *leftarg, PartClause *rightarg,
bool *result);
@@ -309,7 +310,8 @@ static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *pc,
static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
Expr *expr, Datum *value);
static void remove_redundant_clauses(PartitionKey partkey,
- PartitionPruneContext *context);
+ PartitionPruneContext *context,
+ List **minimalclauses);
static Bitmapset *get_partitions_for_keys(Relation rel,
PartScanKeyInfo *keys);
static Bitmapset *get_partitions_for_keys_hash(Relation rel,
@@ -1690,7 +1692,7 @@ get_partition_qual_relid(Oid relid)
}
/*
- * populate_partition_clauses
+ * generate_partition_clauses
* Processes 'clauses' to try to match them to relation's partition
* keys. If any compatible clauses are found which match a partition
* key, then these clauses are stored in 'partclauseinfo'.
@@ -1700,10 +1702,9 @@ get_partition_qual_relid(Oid relid)
* so, then they must be aware that the 'partclauseinfo' may only be partially
* populated.
*/
-void
-populate_partition_clauses(Relation relation,
- int rt_index, List *clauses,
- PartitionClauseInfo *partclauseinfo)
+PartitionClauseInfo *
+generate_partition_clauses(Relation relation,
+ int rt_index, List *clauses)
{
PartitionDesc partdesc;
PartitionKey partkey;
@@ -1744,7 +1745,7 @@ populate_partition_clauses(Relation relation,
clauses = list_concat(clauses, partqual);
}
- extract_partition_key_clauses(partkey, clauses, rt_index, partclauseinfo);
+ return extract_partition_key_clauses(partkey, clauses, rt_index);
}
/*
@@ -1784,15 +1785,19 @@ get_partitions_from_clauses(PartitionPruneContext *context)
else
{
PartitionKey partkey = RelationGetPartitionKey(context->relation);
+ List *minimalclauses[PARTITION_MAX_KEYS];
- /* collapse clauses down to the most restrictive set */
- remove_redundant_clauses(partkey, context);
+ /*
+ * Populate minimal clauses with the most restrictive
+ * of clauses from context's partclauseinfo.
+ */
+ remove_redundant_clauses(partkey, context, minimalclauses);
/* Did remove_redundant_clauses find any contradicting clauses? */
if (partclauseinfo->constfalse)
return NULL;
- if (extract_bounding_datums(partkey, context, &keys))
+ if (extract_bounding_datums(partkey, context, minimalclauses, &keys))
{
result = get_partitions_for_keys(context->relation, &keys);
@@ -1995,14 +2000,14 @@ get_partitions_from_or_clause_args(PartitionPruneContext *context,
foreach(lc, or_clause_args)
{
List *clauses = list_make1(lfirst(lc));
- PartitionClauseInfo partclauseinfo;
+ PartitionClauseInfo *partclauseinfo;
PartitionPruneContext subcontext;
Bitmapset *arg_partset;
- extract_partition_key_clauses(partkey, clauses, context->rt_index,
- &partclauseinfo);
+ partclauseinfo = extract_partition_key_clauses(partkey, clauses,
+ context->rt_index);
- if (!partclauseinfo.foundkeyclauses)
+ if (!partclauseinfo->foundkeyclauses)
{
List *partconstr = RelationGetPartitionQual(context->relation);
PartitionDesc partdesc;
@@ -2024,7 +2029,7 @@ get_partitions_from_or_clause_args(PartitionPruneContext *context,
subcontext.rt_index = context->rt_index;
subcontext.relation = context->relation;
- subcontext.clauseinfo = &partclauseinfo;
+ subcontext.clauseinfo = partclauseinfo;
arg_partset = get_partitions_from_clauses(&subcontext);
result = bms_add_members(result, arg_partset);
@@ -2058,15 +2063,21 @@ get_partitions_from_or_clause_args(PartitionPruneContext *context,
* processing any further clauses. In this case, the caller must be careful
* not to assume the PartitionClauseInfo is fully populated with all clauses.
*/
-static void
+static PartitionClauseInfo *
extract_partition_key_clauses(PartitionKey partkey, List *clauses,
- int rt_index,
- PartitionClauseInfo *partclauseinfo)
+ int rt_index)
{
+ PartitionClauseInfo *partclauseinfo = makeNode(PartitionClauseInfo);
int i;
ListCell *lc;
- memset(partclauseinfo, 0, sizeof(PartitionClauseInfo));
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
foreach(lc, clauses)
{
@@ -2082,7 +2093,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
!DatumGetBool(((Const *) clause)->constvalue))
{
partclauseinfo->constfalse = true;
- return;
+ return partclauseinfo;
}
}
@@ -2261,7 +2272,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
if (bms_is_member(i, partclauseinfo->keyisnull))
{
partclauseinfo->constfalse = true;
- return;
+ return partclauseinfo;
}
/* Record that a strict clause has been seen for this key */
partclauseinfo->keyisnotnull =
@@ -2427,7 +2438,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
if (bms_is_member(i, partclauseinfo->keyisnotnull))
{
partclauseinfo->constfalse = true;
- return;
+ return partclauseinfo;
}
partclauseinfo->keyisnull =
bms_add_member(partclauseinfo->keyisnull,
@@ -2439,7 +2450,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
if (bms_is_member(i, partclauseinfo->keyisnull))
{
partclauseinfo->constfalse = true;
- return;
+ return partclauseinfo;
}
partclauseinfo->keyisnotnull =
@@ -2513,6 +2524,8 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
}
}
}
+
+ return partclauseinfo;
}
/*
@@ -2530,7 +2543,7 @@ extract_partition_key_clauses(PartitionKey partkey, List *clauses,
*/
static bool
extract_bounding_datums(PartitionKey partkey, PartitionPruneContext *context,
- PartScanKeyInfo *keys)
+ List **minimalclauses, PartScanKeyInfo *keys)
{
PartitionClauseInfo *clauseinfo = context->clauseinfo;
bool need_next_eq,
@@ -2563,7 +2576,7 @@ extract_bounding_datums(PartitionKey partkey, PartitionPruneContext *context,
memset(keys, 0, sizeof(PartScanKeyInfo));
for (i = 0; i < partkey->partnatts; i++)
{
- List *clauselist = clauseinfo->keyclauses[i];
+ List *clauselist = minimalclauses[i];
/*
* Min and max keys must constitute a prefix of the partition key and
@@ -2797,7 +2810,8 @@ partkey_datum_from_expr(PartitionKey key, int partkeyidx,
*/
static void
remove_redundant_clauses(PartitionKey partkey,
- PartitionPruneContext *context)
+ PartitionPruneContext *context,
+ List **minimalclauses)
{
PartClause *hash_clause,
*btree_clauses[BTMaxStrategyNumber];
@@ -2806,14 +2820,13 @@ remove_redundant_clauses(PartitionKey partkey,
int s;
int i;
bool test_result;
- List *newlist;
for (i = 0; i < partkey->partnatts; i++)
{
List *keyclauses = partclauseinfo->keyclauses[i];
+ minimalclauses[i] = NIL;
hash_clause = NULL;
- newlist = NIL;
memset(btree_clauses, 0, sizeof(btree_clauses));
@@ -2863,7 +2876,7 @@ remove_redundant_clauses(PartitionKey partkey,
* partition-pruning with it.
*/
else
- newlist = lappend(newlist, pc);
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
/*
* The code below handles btree operators, so not relevant for
@@ -2924,7 +2937,7 @@ remove_redundant_clauses(PartitionKey partkey,
* the previous one in btree_clauses[s] and push this one directly
* to the output list.
*/
- newlist = lappend(newlist, pc);
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
}
}
}
@@ -2933,9 +2946,7 @@ remove_redundant_clauses(PartitionKey partkey,
{
/* Note we didn't add this one to the result yet. */
if (hash_clause)
- newlist = lappend(newlist, hash_clause);
- list_free(partclauseinfo->keyclauses[i]);
- partclauseinfo->keyclauses[i] = newlist;
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
continue;
}
@@ -3026,15 +3037,9 @@ remove_redundant_clauses(PartitionKey partkey,
for (s = 0; s < BTMaxStrategyNumber; s++)
{
if (btree_clauses[s])
- newlist = lappend(newlist, btree_clauses[s]);
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
}
-
- /*
- * Replace the old List with the new one with the redundant clauses
- * removed.
- */
- list_free(partclauseinfo->keyclauses[i]);
- partclauseinfo->keyclauses[i] = newlist;
}
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index af96581..f6a4e3d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -875,21 +875,21 @@ get_append_rel_partitions(PlannerInfo *root,
{
Relation partrel;
Bitmapset *partindexes;
- PartitionClauseInfo partclauseinfo;
+ PartitionClauseInfo *partclauseinfo;
partrel = heap_open(rte->relid, NoLock);
- /* Process clauses and populate partclauseinfo */
- populate_partition_clauses(partrel, rel->relid,
- clauses, &partclauseinfo);
+ /* process clauses and generate the partclauseinfo */
+ partclauseinfo = generate_partition_clauses(partrel, rel->relid,
+ clauses);
- if (!partclauseinfo.constfalse)
+ if (!partclauseinfo->constfalse)
{
PartitionPruneContext context;
context.rt_index = rel->relid;
context.relation = partrel;
- context.clauseinfo = &partclauseinfo;
+ context.clauseinfo = partclauseinfo;
partindexes = get_partitions_from_clauses(&context);
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 78d43ea..0631d3d 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -81,9 +81,8 @@ extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
/* For partition-pruning */
-extern void populate_partition_clauses(Relation relation,
- int rt_index, List *clauses,
- PartitionClauseInfo *partclauseinfo);
+PartitionClauseInfo *generate_partition_clauses(Relation relation,
+ int rt_index, List *clauses);
extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
#endif /* PARTITION_H */
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 642ea0f..54c678b 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1517,6 +1517,8 @@ typedef struct OnConflictExpr
*/
typedef struct PartitionClauseInfo
{
+ NodeTag type;
+
/* Lists of clauses indexed by the partition key */
List *keyclauses[PARTITION_MAX_KEYS];
+/*
+ * PartitionBoundCmpArg - Caller-defined argument to be passed to
+ * partition_bound_cmp()
+ *
+ * The first (fixed) argument involved in a comparison is the partition bound
+ * found in the catalog, while an instance of the following struct describes
+ * either a new partition bound being compared against existing bounds
+ * (caller should set is_bound to true and set bound), or a new tuple's
+ * partition key specified in datums (caller should set ndatums to the number
+ * of valid datums that are passed in the array).
+ */
+typedef struct PartitionBoundCmpArg
+{
+ bool is_bound;
+ union
+ {
+ PartitionListValue *lbound;
+ PartitionRangeBound *rbound;
+ PartitionHashBound *hbound;
+ } bound;
+
+ Datum *datums;
+ int ndatums;
+} PartitionBoundCmpArg;
This is a pretty strange definition. datums/ndatums are never valid
at the same time as any of lbound/rbound/hbound, but are not included
in the union. Also, is_bound doesn't tell you which of
rbound/lbound/hbound is actually valid. Granted, the current calling
convention looks like a mess, too. Apparently, the argument to
partition_bound_cmp is a PartitionBoundSpec if using hash
partitioning, a Datum if list partitioning, and either a
PartitionRangeBound or a Datum * if range partitioning depending on
the value of probe_is_bound, and I use the word "apparently" because
there are zero words of comments explaining what the argument to
partition_bound_cmp of type "void *" is supposed to mean. I really
should have noticed that and insisted that it be fixed before
partitioning got committed.
Looking a bit further, there are exactly two calls to
partition_bound_cmp(). One is in partition_bound_bsearch() and the
other is in check_new_partition_bound(). Now, looking at this, both
the call to partition_bound_cmp() and every single call to
partition_bound_bsearch() are inside a switch branch where we've
dispatched on the partitioning type, which means that from code that
is already specialized by partitioning type we are calling generic
code which then turns back around and calls code that is specialized
by partitioning type. Now, that could make sense if the generic code
is pretty complex, but here's it's basically just the logic to do a
bsearch. It seems to me that a cleaner solution here would be to
duplicate that logic. Then we could have...
static int partition_list_bsearch(PartitionKey key, PartitionBoundInfo
boundinfo,
Datum value, bool *is_equal);
static int partition_range_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe);
static int partition_range_datum_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values);
static int partition_hash_bsearch(PartitionKey key, PartitionBoundInfo
boundinfo,
int modulus, int remainder, bool *is_equal);
...which would involve fewer branches at runtime and more type-safety
at compile time. partition_hash_bsearch could directly call
partition_hbound_cmp, partition_list_bsearch could directly invoke
FunctionCall2Coll, partition_range_bsearch could directly call
partition_rbound_cmp, and partition_range_datum_bsearch could directly
call partition_rbound_datum_cmp.
All-in-all that seems a lot nicer to me than what we have here now.
IIUC, the purpose of this patch is to let you search on a prefix of
the partition keys, but I think that's really only possible for range
partitioning, and it seems like the proposed nvalues argument to
partition_range_datum_bsearch would give you what you need.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Thanks for the review.
On 2018/02/02 7:38, Robert Haas wrote:
+/* + * PartitionBoundCmpArg - Caller-defined argument to be passed to + * partition_bound_cmp() + * + * The first (fixed) argument involved in a comparison is the partition bound + * found in the catalog, while an instance of the following struct describes + * either a new partition bound being compared against existing bounds + * (caller should set is_bound to true and set bound), or a new tuple's + * partition key specified in datums (caller should set ndatums to the number + * of valid datums that are passed in the array). + */ +typedef struct PartitionBoundCmpArg +{ + bool is_bound; + union + { + PartitionListValue *lbound; + PartitionRangeBound *rbound; + PartitionHashBound *hbound; + } bound; + + Datum *datums; + int ndatums; +} PartitionBoundCmpArg;This is a pretty strange definition. datums/ndatums are never valid
at the same time as any of lbound/rbound/hbound, but are not included
in the union. Also, is_bound doesn't tell you which of
rbound/lbound/hbound is actually valid. Granted, the current calling
convention looks like a mess, too. Apparently, the argument to
partition_bound_cmp is a PartitionBoundSpec if using hash
partitioning, a Datum if list partitioning, and either a
PartitionRangeBound or a Datum * if range partitioning depending on
the value of probe_is_bound, and I use the word "apparently" because
there are zero words of comments explaining what the argument to
partition_bound_cmp of type "void *" is supposed to mean. I really
should have noticed that and insisted that it be fixed before
partitioning got committed.
Yeah, I was trying to fix the status quo by introducing that new struct,
but I agree it's much better to modify the functions around a bit like the
way you describe below.
Looking a bit further, there are exactly two calls to
partition_bound_cmp(). One is in partition_bound_bsearch() and the
other is in check_new_partition_bound(). Now, looking at this, both
the call to partition_bound_cmp() and every single call to
partition_bound_bsearch() are inside a switch branch where we've
dispatched on the partitioning type, which means that from code that
is already specialized by partitioning type we are calling generic
code which then turns back around and calls code that is specialized
by partitioning type. Now, that could make sense if the generic code
is pretty complex, but here's it's basically just the logic to do a
bsearch. It seems to me that a cleaner solution here would be to
duplicate that logic. Then we could have...static int partition_list_bsearch(PartitionKey key, PartitionBoundInfo
boundinfo,
Datum value, bool *is_equal);
static int partition_range_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe);
static int partition_range_datum_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values);
static int partition_hash_bsearch(PartitionKey key, PartitionBoundInfo
boundinfo,
int modulus, int remainder, bool *is_equal);...which would involve fewer branches at runtime and more type-safety
at compile time. partition_hash_bsearch could directly call
partition_hbound_cmp, partition_list_bsearch could directly invoke
FunctionCall2Coll, partition_range_bsearch could directly call
partition_rbound_cmp, and partition_range_datum_bsearch could directly
call partition_rbound_datum_cmp.All-in-all that seems a lot nicer to me than what we have here now.
IIUC, the purpose of this patch is to let you search on a prefix of
the partition keys, but I think that's really only possible for range
partitioning, and it seems like the proposed nvalues argument to
partition_range_datum_bsearch would give you what you need.
Your proposed cleanup sounds much better, so I implemented it in the
attached updated 0001, while dropping the previously proposed
PartitionBoundCmpArg approach.
Updated set of patches attached (patches 0002 onward mostly unchanged,
except I incorporated the delta patch posted by David upthread).
Thanks,
Amit
Attachments:
v24-0001-Refactor-code-for-partition-bound-searching.patchtext/plain; charset=UTF-8; name=v24-0001-Refactor-code-for-partition-bound-searching.patchDownload
From 2e152cfa2ce1f12acd3aa35f8ec7b5618f0af29b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 11:33:27 +0900
Subject: [PATCH v24 1/5] Refactor code for partition bound searching
Currently, for all partitioning types and for different purposes,
a single partition_bound_bsearch() is used, which led to an ugly
interface. Instead, break it down into one function each for list,
range, and hash partitioning. For range partitioning, we need the
ability to look up only a prefix of the partition key, so its
function's interface needs to allow for specifying such input tuple.
---
src/backend/catalog/partition.c | 265 ++++++++++++++++++++++++++--------------
1 file changed, 170 insertions(+), 95 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 45945511f0..31c80c7f1a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -170,14 +170,21 @@ static int32 partition_rbound_cmp(PartitionKey key,
bool lower1, PartitionRangeBound *b2);
static int32 partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums);
+ Datum *tuple_datums, int n_tuple_datums);
-static int32 partition_bound_cmp(PartitionKey key,
- PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound);
-static int partition_bound_bsearch(PartitionKey key,
+static int partition_list_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ Datum value, bool *is_equal);
+static int partition_range_bsearch(PartitionKey key,
PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal);
+ PartitionRangeBound *probe, bool *is_equal);
+static int partition_range_datum_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ int nvalues, Datum *values, bool *is_equal);
+static int partition_hash_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ int modulus, int remainder);
+
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
@@ -981,8 +988,7 @@ check_new_partition_bound(char *relname, Relation parent,
int greatest_modulus;
int remainder;
int offset;
- bool equal,
- valid_modulus = true;
+ bool valid_modulus = true;
int prev_modulus, /* Previous largest modulus */
next_modulus; /* Next largest modulus */
@@ -995,12 +1001,13 @@ check_new_partition_bound(char *relname, Relation parent,
* modulus 10 and a partition with modulus 15, because 10
* is not a factor of 15.
*
- * Get greatest bound in array boundinfo->datums which is
- * less than or equal to spec->modulus and
- * spec->remainder.
+ * Get the greatest (modulus, remainder) pair contained in
+ * boundinfo->datums that is less than or equal to the
+ * (spec->modulus, spec->remainder) pair.
*/
- offset = partition_bound_bsearch(key, boundinfo, spec,
- true, &equal);
+ offset = partition_hash_bsearch(key, boundinfo,
+ spec->modulus,
+ spec->remainder);
if (offset < 0)
{
next_modulus = DatumGetInt32(datums[0][0]);
@@ -1074,9 +1081,9 @@ check_new_partition_bound(char *relname, Relation parent,
int offset;
bool equal;
- offset = partition_bound_bsearch(key, boundinfo,
- &val->constvalue,
- true, &equal);
+ offset = partition_list_bsearch(key, boundinfo,
+ val->constvalue,
+ &equal);
if (offset >= 0 && equal)
{
overlap = true;
@@ -1148,8 +1155,8 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_bound_bsearch(key, boundinfo, lower,
- true, &equal);
+ offset = partition_range_bsearch(key, boundinfo, lower,
+ &equal);
if (boundinfo->indexes[offset + 1] < 0)
{
@@ -1162,10 +1169,16 @@ check_new_partition_bound(char *relname, Relation parent,
if (offset + 1 < boundinfo->ndatums)
{
int32 cmpval;
+ Datum *datums;
+ PartitionRangeDatumKind *kind;
+ bool is_lower;
+
+ datums = boundinfo->datums[offset + 1];
+ kind = boundinfo->kind[offset + 1];
+ is_lower = (boundinfo->indexes[offset + 1] == -1);
- cmpval = partition_bound_cmp(key, boundinfo,
- offset + 1, upper,
- true);
+ cmpval = partition_rbound_cmp(key, datums, kind,
+ is_lower, upper);
if (cmpval < 0)
{
/*
@@ -2574,11 +2587,9 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
bool equal = false;
- bound_offset = partition_bound_bsearch(key,
- partdesc->boundinfo,
- values,
- false,
- &equal);
+ bound_offset = partition_list_bsearch(key,
+ partdesc->boundinfo,
+ values[0], &equal);
if (bound_offset >= 0 && equal)
part_index = partdesc->boundinfo->indexes[bound_offset];
}
@@ -2605,12 +2616,11 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
- bound_offset = partition_bound_bsearch(key,
- partdesc->boundinfo,
- values,
- false,
- &equal);
-
+ bound_offset = partition_range_datum_bsearch(key,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
/*
* The bound at bound_offset is less than or equal to the
* tuple value, so the bound at offset+1 is the upper
@@ -2881,12 +2891,12 @@ partition_rbound_cmp(PartitionKey key,
static int32
partition_rbound_datum_cmp(PartitionKey key,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums)
+ Datum *tuple_datums, int n_tuple_datums)
{
int i;
int32 cmpval = -1;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < n_tuple_datums; i++)
{
if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
return -1;
@@ -2905,84 +2915,104 @@ partition_rbound_datum_cmp(PartitionKey key,
}
/*
- * partition_bound_cmp
+ * partition_list_bsearch
+ * Returns the index of the greatest bound datum that is less than equal
+ * to the given value or -1 if all of the bound datums are greater
*
- * Return whether the bound at offset in boundinfo is <, =, or > the argument
- * specified in *probe.
+ * *is_equal is set to true if the bound datum at the returned index is equal
+ * to the input value.
*/
-static int32
-partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
- int offset, void *probe, bool probe_is_bound)
+static int
+partition_list_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ Datum value, bool *is_equal)
{
- Datum *bound_datums = boundinfo->datums[offset];
- int32 cmpval = -1;
+ int lo,
+ hi,
+ mid;
- switch (key->strategy)
+ lo = -1;
+ hi = boundinfo->ndatums - 1;
+ while (lo < hi)
{
- case PARTITION_STRATEGY_HASH:
- {
- PartitionBoundSpec *spec = (PartitionBoundSpec *) probe;
+ int32 cmpval;
- cmpval = partition_hbound_cmp(DatumGetInt32(bound_datums[0]),
- DatumGetInt32(bound_datums[1]),
- spec->modulus, spec->remainder);
+ mid = (lo + hi + 1) / 2;
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ boundinfo->datums[mid][0],
+ value));
+ if (cmpval <= 0)
+ {
+ lo = mid;
+ *is_equal = (cmpval == 0);
+ if (*is_equal)
break;
- }
- case PARTITION_STRATEGY_LIST:
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- bound_datums[0],
- *(Datum *) probe));
- break;
+ }
+ else
+ hi = mid - 1;
+ }
- case PARTITION_STRATEGY_RANGE:
- {
- PartitionRangeDatumKind *kind = boundinfo->kind[offset];
+ return lo;
+}
- if (probe_is_bound)
- {
- /*
- * We need to pass whether the existing bound is a lower
- * bound, so that two equal-valued lower and upper bounds
- * are not regarded equal.
- */
- bool lower = boundinfo->indexes[offset] < 0;
+/*
+ * partition_range_bsearch
+ * Returns the index of the greatest range bound that is less than or
+ * equal to the given range bound or -1 if all of the range bounds are
+ * greater
+ *
+ * *is_equal is set to true if the range bound at the returned index is equal
+ * to the input range bound
+ */
+static int
+partition_range_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ PartitionRangeBound *probe, bool *is_equal)
+{
+ int lo,
+ hi,
+ mid;
- cmpval = partition_rbound_cmp(key,
- bound_datums, kind, lower,
- (PartitionRangeBound *) probe);
- }
- else
- cmpval = partition_rbound_datum_cmp(key,
- bound_datums, kind,
- (Datum *) probe);
- break;
- }
+ lo = -1;
+ hi = boundinfo->ndatums - 1;
+ while (lo < hi)
+ {
+ int32 cmpval;
- default:
- elog(ERROR, "unexpected partition strategy: %d",
- (int) key->strategy);
+ mid = (lo + hi + 1) / 2;
+ cmpval = partition_rbound_cmp(key,
+ boundinfo->datums[mid],
+ boundinfo->kind[mid],
+ (boundinfo->indexes[mid] == -1),
+ probe);
+ if (cmpval <= 0)
+ {
+ lo = mid;
+ *is_equal = (cmpval == 0);
+
+ if (*is_equal)
+ break;
+ }
+ else
+ hi = mid - 1;
}
- return cmpval;
+ return lo;
}
/*
- * Binary search on a collection of partition bounds. Returns greatest
- * bound in array boundinfo->datums which is less than or equal to *probe.
- * If all bounds in the array are greater than *probe, -1 is returned.
- *
- * *probe could either be a partition bound or a Datum array representing
- * the partition key of a tuple being routed; probe_is_bound tells which.
- * We pass that down to the comparison function so that it can interpret the
- * contents of *probe accordingly.
+ * partition_range_bsearch
+ * Returns the index of the greatest range bound that is less than or
+ * equal to the given tuple or -1 if all of the range bounds are greater
*
- * *is_equal is set to whether the bound at the returned index is equal with
- * *probe.
+ * *is_equal is set to true if the range bound at the returned index is equal
+ * to the input tuple.
*/
static int
-partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
- void *probe, bool probe_is_bound, bool *is_equal)
+partition_range_datum_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ int nvalues, Datum *values, bool *is_equal)
{
int lo,
hi,
@@ -2995,8 +3025,11 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
- probe_is_bound);
+ cmpval = partition_rbound_datum_cmp(key,
+ boundinfo->datums[mid],
+ boundinfo->kind[mid],
+ values,
+ nvalues);
if (cmpval <= 0)
{
lo = mid;
@@ -3013,6 +3046,48 @@ partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo,
}
/*
+ * partition_hash_bsearch
+ * Returns the index of the greatest (modulus, remainder) pair that is
+ * less than or equal to the given (modulus, remainder) pair or -1 if
+ * all of them are greater
+ */
+static int
+partition_hash_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ int modulus, int remainder)
+{
+ int lo,
+ hi,
+ mid;
+
+ lo = -1;
+ hi = boundinfo->ndatums - 1;
+ while (lo < hi)
+ {
+ int32 cmpval,
+ bound_modulus,
+ bound_remainder;
+
+ mid = (lo + hi + 1) / 2;
+ bound_modulus = DatumGetInt32(boundinfo->datums[mid][0]);
+ bound_remainder = DatumGetInt32(boundinfo->datums[mid][1]);
+ cmpval = partition_hbound_cmp(bound_modulus, bound_remainder,
+ modulus, remainder);
+ if (cmpval <= 0)
+ {
+ lo = mid;
+
+ if (cmpval == 0)
+ break;
+ }
+ else
+ hi = mid - 1;
+ }
+
+ return lo;
+}
+
+/*
* get_default_oid_from_partdesc
*
* Given a partition descriptor, return the OID of the default partition, if
--
2.11.0
v24-0002-Introduce-a-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v24-0002-Introduce-a-get_partitions_from_clauses.patchDownload
From bddc086b7592174c847ec7509e51d5955bc49d97 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v24 2/5] Introduce a get_partitions_from_clauses()
Whereas get_partition_for_tuple() takes a tuple and returns index
of the partition of the table that should contain that tuple,
get_partitions_from_clauses() will take a list of clauses and return
a set of indexes of the partitions that satisfy all of those clauses.
Aforementioned list of clauses must be all clauses that were matched
to the partition key(s) using populate_partition_clauses()
It is meant as a faster alternative to the planner's current method
of selecting a table's partitions by running contraint exclusion
algorithm against the partition constraint of each of the partitions.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/catalog/partition.c | 2061 ++++++++++++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c | 22 +
src/backend/optimizer/util/clauses.c | 4 +-
src/include/catalog/partition.h | 12 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/primnodes.h | 33 +
src/include/optimizer/clauses.h | 2 +
8 files changed, 2135 insertions(+), 3 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 31c80c7f1a..87c55913f6 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -28,6 +28,8 @@
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_opclass.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
#include "catalog/pg_partitioned_table.h"
#include "catalog/pg_type.h"
#include "commands/tablecmds.h"
@@ -38,6 +40,8 @@
#include "nodes/parsenodes.h"
#include "optimizer/clauses.h"
#include "optimizer/planmain.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
@@ -138,6 +142,81 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Properties found are cached and are indexed by the
+ * partition key index.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses found for the corresponding partition
+ * are inclusive of the stored value or not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
void *arg);
@@ -192,6 +271,37 @@ static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_excluded_by_ne_clauses(
+ PartitionPruneContext *context,
+ List *ne_clauses);
+static Bitmapset *get_partitions_from_or_clause_args(
+ PartitionPruneContext *context,
+ List *or_clause_args);
+static PartitionClauseInfo *extract_partition_key_clauses(
+ PartitionKey partkey, List *clauses,
+ int rt_index);
+static bool extract_bounding_datums(PartitionKey partkey,
+ PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static bool partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static PartOpStrategy partition_op_strategy(PartitionKey key, PartClause *pc,
+ bool *incl);
+static bool partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value);
+static void remove_redundant_clauses(PartitionKey partkey,
+ PartitionPruneContext *context,
+ List **minimalclauses);
+static Bitmapset *get_partitions_for_keys(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_hash(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(Relation rel,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(Relation rel,
+ PartScanKeyInfo *keys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1550,9 +1660,1960 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * generate_partition_clauses
+ * Processes 'clauses' to try to match them to relation's partition
+ * keys. If any compatible clauses are found which match a partition
+ * key, then these clauses are stored in 'partclauseinfo'.
+ *
+ * The caller must ensure that 'clauses' is not an empty List. Upon return,
+ * callers must also check if the 'partclauseinfo' constfalse has been set, if
+ * so, then they must be aware that the 'partclauseinfo' may only be partially
+ * populated.
+ */
+PartitionClauseInfo *
+generate_partition_clauses(Relation relation,
+ int rt_index, List *clauses)
+{
+ PartitionDesc partdesc;
+ PartitionKey partkey;
+ PartitionBoundInfo boundinfo;
+
+ Assert(clauses != NIL);
+
+ partkey = RelationGetPartitionKey(relation);
+ partdesc = RelationGetPartitionDesc(relation);
+
+ /* Some functions called below modify this list */
+ clauses = list_copy(clauses);
+ boundinfo = partdesc->boundinfo;
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement
+ * is perhaps unlikely for non-default partitions, but it may be more
+ * likely in the case of default partitions, so we'll add the parent
+ * partition table's partition qual to the clause list in this case only.
+ * This may result in the default partition being eliminated.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ List *partqual = RelationGetPartitionQual(relation);
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rt_index != 1)
+ ChangeVarNodes((Node *) partqual, 1, rt_index, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ return extract_partition_key_clauses(partkey, clauses, rt_index);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine all partitions of context->relation that could possibly
+ * contain a record that matches clauses as described in
+ * context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartitionDesc partdesc;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+
+ /*
+ * Check if there were proofs that no partitions can match due to some
+ * clause items contradicting another.
+ */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ partdesc = RelationGetPartitionDesc(context->relation);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+ else
+ {
+ PartitionKey partkey = RelationGetPartitionKey(context->relation);
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * Populate minimal clauses with the most restrictive
+ * of clauses from context's partclauseinfo.
+ */
+ remove_redundant_clauses(partkey, context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(partkey, context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context->relation, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have the values we'd need to eliminate
+ * partitions using get_partitions_for_keys, likely because
+ * context->clauseinfo only contained <> clauses and/or OR
+ * clauses, which are handled further below in this function.
+ */
+ result = bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+ }
+
+ /* Select partitions by applying the clauses containing <> operators. */
+ if (partclauseinfo->ne_clauses)
+ {
+ Bitmapset *ne_parts;
+
+ ne_parts = get_partitions_excluded_by_ne_clauses(context,
+ partclauseinfo->ne_clauses);
+
+ /* Remove any partitions we found to not be needed */
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+
+ /* Select partitions by applying OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ PartitionPruneContext orcontext;
+ Bitmapset *or_parts;
+
+ orcontext.rt_index = context->rt_index;
+ orcontext.relation = context->relation;
+ orcontext.clauseinfo = NULL;
+
+ or_parts = get_partitions_from_or_clause_args(&orcontext, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_excluded_by_ne_clauses
+ *
+ * Returns a Bitmapset of partition indexes of any partition that can safely
+ * be removed due to 'ne_clauses' containing not-equal clauses for all
+ * possible values that the partition can contain.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_clauses(PartitionPruneContext *context,
+ List *ne_clauses)
+{
+ ListCell *lc;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+ Relation relation = context->relation;
+ PartitionKey partkey = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ int *datums_in_part;
+ int *datums_found;
+ int i;
+
+ Assert(partkey->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partkey->partnatts == 1);
+
+ /*
+ * Build a Bitmapset to record the indexes of all datums of the
+ * query that are found in boundinfo.
+ */
+ foreach(lc, ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(partkey, 0, pc->value, &datum))
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partkey, boundinfo, datum,
+ &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * partdesc->nparts);
+ datums_found = (int *) palloc0(sizeof(int) * partdesc->nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < partdesc->nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
+ * get_partitions_from_or_clause_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_clause_args.
+ */
+static Bitmapset *
+get_partitions_from_or_clause_args(PartitionPruneContext *context,
+ List *or_clause_args)
+{
+ PartitionKey partkey = RelationGetPartitionKey(context->relation);
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_clause_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionClauseInfo *partclauseinfo;
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ partclauseinfo = extract_partition_key_clauses(partkey, clauses,
+ context->rt_index);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ List *partconstr = RelationGetPartitionQual(context->relation);
+ PartitionDesc partdesc;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->rt_index != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->rt_index,
+ 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ partdesc = RelationGetPartitionDesc(context->relation);
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+ }
+
+ subcontext.rt_index = context->rt_index;
+ subcontext.relation = context->relation;
+ subcontext.clauseinfo = partclauseinfo;
+ arg_partset = get_partitions_from_clauses(&subcontext);
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/* Match partition key (partattno/partexpr) to an expression (expr). */
+#define EXPR_MATCHES_PARTKEY(expr, partattno, partexpr) \
+ ((partattno) != 0 ? \
+ (IsA((expr), Var) && \
+ ((Var *) (expr))->varattno == (partattno)) : \
+ equal((expr), (partexpr)))
+
+#define COLLATION_MATCH(partcoll, exprcoll) \
+ (!OidIsValid(partcoll) || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_key_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in 'partclauseinfo'. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * partclauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the PartitionClauseInfo is fully populated with all clauses.
+ */
+static PartitionClauseInfo *
+extract_partition_key_clauses(PartitionKey partkey, List *clauses,
+ int rt_index)
+{
+ PartitionClauseInfo *partclauseinfo = makeNode(PartitionClauseInfo);
+ int i;
+ ListCell *lc;
+
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ ListCell *partexprs_item;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ partexprs_item = list_head(partkey->partexprs);
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ PartClause *pc;
+ Oid partopfamily = partkey->partopfamily[i];
+ Oid partcoll = partkey->partcollation[i];
+ Oid commutator = InvalidOid;
+ AttrNumber partattno = partkey->partattrs[i];
+ Expr *partexpr = NULL;
+
+ /*
+ * A zero attno means the partition key is an expression, so grab
+ * the next expression from the list.
+ */
+ if (partattno == 0)
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ partexpr = (Expr *) lfirst(partexprs_item);
+
+ /*
+ * Expressions stored for the PartitionKey in the relcache are
+ * all stored with the dummy varno of 1. Change that to what
+ * we need.
+ */
+ if (rt_index != 1)
+ {
+ /* make a copy so as not to overwrite the relcache */
+ partexpr = (Expr *) copyObject(partexpr);
+ ChangeVarNodes((Node *) partexpr, 1, rt_index, 0);
+ }
+
+ partexprs_item = lnext(partexprs_item);
+ }
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ valueexpr = rightop;
+ else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ continue;
+ }
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ continue;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering operators like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber &&
+ partkey->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!COLLATION_MATCH(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ continue;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle it if its negator is indeed a part of the
+ * partitioning equality operator.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ continue;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (EXPR_MATCHES_PARTKEY(arg, partattno, partexpr))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+ /*
+ * Boolean clauses have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ continue;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
+ continue;
+
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+ }
+
+ return partclauseinfo;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal values that we're able to determine.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionKey partkey, PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ partkey->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(partkey, clause, &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(partkey, i, value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == partkey->partnatts ||
+ partkey->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(PartitionKey key, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (key->strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(PartitionKey key, int partkeyidx,
+ Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != key->parttypid[partkeyidx])
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ key->parttypid[partkeyidx], -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support higher-level
+ * code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets a field
+ * in context->clauseinfo to inform the caller that we found such clause.
+ */
+static void
+remove_redundant_clauses(PartitionKey partkey,
+ PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ partkey->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(partkey, i,
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(partkey, i,
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (partkey->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(partkey, i,
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(partkey, i,
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(PartitionKey key, int partkeyidx,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(key, partkeyidx,
+ rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid partopfamily = key->partopfamily[partkeyidx];
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the partitions of 'rel' that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selceted partitions
+ */
+static Bitmapset *
+get_partitions_for_keys(Relation rel, PartScanKeyInfo *keys)
+{
+ /* Return an empty set if no partitions to see. */
+ if (RelationGetPartitionDesc(rel)->nparts == 0)
+ return NULL;
+
+ switch (RelationGetPartitionKey(rel)->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(rel, keys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(rel, keys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(rel, keys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ RelationGetPartitionKey(rel)->strategy);
+ }
+
+ return NULL; /* keep compiler quiet */
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(Relation rel, PartScanKeyInfo *keys)
+{
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionDesc partdesc = RelationGetPartitionDesc(rel);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ int i;
+
+ Assert(partdesc->nparts > 0);
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partkey->partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partkey, keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, partdesc->nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+ Assert(partkey->partnatts == 1);
+
+ /*
+ * If the query is looking for null keys, there can only be one such
+ * partition. Return the same if one exists.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partkey, boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partkey, boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * The bound at minoff is <= minkeys, given the way
+ * partition_bound_bsearch() works. If it's not equal (<), then
+ * increment minoff to make it point to the datum on the right
+ * that necessarily satisfies minkeys. Also do the same if it is
+ * equal but minkeys is exclusive.
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * minkeys is greater than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partkey, boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * The bound at maxoff is <= maxkeys, given the way
+ * partition_bound_bsearch works. If the bound at maxoff exactly
+ * matches maxkey (is_equal), but the maxkey is exclusive, then
+ * decrement maxoff to point to the bound on the left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(Relation rel, PartScanKeyInfo *keys)
+{
+ Bitmapset *result = NULL;
+ PartitionKey partkey = RelationGetPartitionKey(rel);
+ PartitionBoundInfo boundinfo = RelationGetPartitionDesc(rel)->boundinfo;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal,
+ include_def = false;
+
+ Assert(RelationGetPartitionDesc(rel)->nparts > 0);
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partkey->partnatts);
+ eqoff = partition_range_datum_bsearch(partkey, boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering it as the lower bound
+ * of the partition that eqkeys falls into, the bound at eqoff + 1
+ * would be its upper bound, so use eqoff + 1 to get the desired
+ * partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partkey, boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_bound_bsearch would've returned the offset of just one of
+ * those. If minkey is inclusive, we must decrement minoff until it
+ * reaches the leftmost of those bound values, so that partitions
+ * corresponding to all those bound values are selected. If minkeys
+ * is exclusive, we must increment minoff until it reaches the first
+ * bound greater than this prefix, so that none of the partitions
+ * corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partkey->partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partkey,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff += 1;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partkey, boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partkey->partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partkey,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, we believe that minoff/maxoff point to the upper bound
+ * of some partition, but it may not be the case. It might actually be
+ * the upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range us unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partkey->partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partkey->partnatts ||
+ keys->n_maxkeys < partkey->partnatts)
+ {
+ for (i = 0; i < partkey->partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index fd3001c493..2fc54defbd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2127,6 +2127,25 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+static PartitionClauseInfo *
+_copyPartitionClauseInfo(const PartitionClauseInfo *from)
+{
+ PartitionClauseInfo *newnode = makeNode(PartitionClauseInfo);
+
+ int i;
+ for (i = 0; i < PARTITION_MAX_KEYS; i++)
+ COPY_NODE_FIELD(keyclauses[i]);
+
+ COPY_NODE_FIELD(or_clauses);
+ COPY_NODE_FIELD(ne_clauses);
+ COPY_BITMAPSET_FIELD(keyisnull);
+ COPY_BITMAPSET_FIELD(keyisnotnull);
+ COPY_SCALAR_FIELD(constfalse);
+ COPY_SCALAR_FIELD(foundkeyclauses);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5009,6 +5028,9 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionClauseInfo:
+ retval = _copyPartitionClauseInfo(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..0631d3dbbd 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -40,6 +40,13 @@ typedef struct PartitionDescData
PartitionBoundInfo boundinfo; /* collection of partition bounds */
} PartitionDescData;
+typedef struct PartitionPruneContext
+{
+ int rt_index;
+ Relation relation;
+ PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
typedef struct PartitionDescData *PartitionDesc;
extern void RelationBuildPartitionDesc(Relation relation);
@@ -73,4 +80,9 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+PartitionClauseInfo *generate_partition_clauses(Relation relation,
+ int rt_index, List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..0ac242aeda 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -190,6 +190,7 @@ typedef enum NodeTag
T_JoinExpr,
T_FromExpr,
T_OnConflictExpr,
+ T_PartitionClauseInfo,
T_IntoClause,
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..54c678bb43 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,37 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionClauseInfo
+ *
+ * Stores clauses which were matched to a partition key. Each matching clause
+ * is stored in the 'keyclauses' list for the partition key index that it was
+ * matched to. Other details are also stored, such as OR clauses and
+ * not-equal (<>) clauses. Nullness properties are also stored.
+ *----------
+ */
+typedef struct PartitionClauseInfo
+{
+ NodeTag type;
+
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
--
2.11.0
v24-0003-Move-some-code-of-set_append_rel_size-to-separat.patchtext/plain; charset=UTF-8; name=v24-0003-Move-some-code-of-set_append_rel_size-to-separat.patchDownload
From ca8ff62aae63cb86bec5219f14d682ffc7588e31 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 13:46:26 +0900
Subject: [PATCH v24 3/5] Move some code of set_append_rel_size to separate
function
The code that initializes basic properties of a partition RelOptInfo
from the information in parent's RelOptInfo. It will be needed to be
called by the pairwise-join related code to minimally initialize the
partitions that earlier planning would have considered pruned and
hence left untouched. That's not true currently, because the current
pruning method touches each partition (setting its basic properties)
before considering it pruned.
---
src/backend/optimizer/path/allpaths.c | 80 ++-----------------------------
src/backend/optimizer/util/relnode.c | 90 +++++++++++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 ++
3 files changed, 97 insertions(+), 77 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd1a58336b..fd68374e20 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -921,85 +921,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
childrel = find_base_rel(root, childRTindex);
Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
- if (rel->part_scheme)
- {
- AttrNumber attno;
-
- /*
- * We need attr_needed data for building targetlist of a join
- * relation representing join between matching partitions for
- * partition-wise join. A given attribute of a child will be
- * needed in the same highest joinrel where the corresponding
- * attribute of parent is needed. Hence it suffices to use the
- * same Relids set for parent and child.
- */
- for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
- {
- int index = attno - rel->min_attr;
- Relids attr_needed = rel->attr_needed[index];
-
- /* System attributes do not need translation. */
- if (attno <= 0)
- {
- Assert(rel->min_attr == childrel->min_attr);
- childrel->attr_needed[index] = attr_needed;
- }
- else
- {
- Var *var = list_nth_node(Var,
- appinfo->translated_vars,
- attno - 1);
- int child_index;
-
- /*
- * Ignore any column dropped from the parent.
- * Corresponding Var won't have any translation. It won't
- * have attr_needed information, since it can not be
- * referenced in the query.
- */
- if (var == NULL)
- {
- Assert(attr_needed == NULL);
- continue;
- }
-
- child_index = var->varattno - childrel->min_attr;
- childrel->attr_needed[child_index] = attr_needed;
- }
- }
- }
-
- /*
- * Copy/Modify targetlist. Even if this child is deemed empty, we need
- * its targetlist in case it falls on nullable side in a child-join
- * because of partition-wise join.
- *
- * NB: the resulting childrel->reltarget->exprs may contain arbitrary
- * expressions, which otherwise would not occur in a rel's targetlist.
- * Code that might be looking at an appendrel child must cope with
- * such. (Normally, a rel's targetlist would only include Vars and
- * PlaceHolderVars.) XXX we do not bother to update the cost or width
- * fields of childrel->reltarget; not clear if that would be useful.
- */
- childrel->reltarget->exprs = (List *)
- adjust_appendrel_attrs(root,
- (Node *) rel->reltarget->exprs,
- 1, &appinfo);
-
/*
- * We have to make child entries in the EquivalenceClass data
- * structures as well. This is needed either if the parent
- * participates in some eclass joins (because we will want to consider
- * inner-indexscan joins on the individual children) or if the parent
- * has useful pathkeys (because we should try to build MergeAppend
- * paths that produce those sort orderings). Even if this child is
- * deemed dummy, it may fall on nullable side in a child-join, which
- * in turn may participate in a MergeAppend, where we will need the
- * EquivalenceClass data structures.
+ * Initialize some properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
*/
- if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
- add_child_rel_equivalences(root, appinfo, rel, childrel);
- childrel->has_eclass_joins = rel->has_eclass_joins;
+ set_basic_child_rel_properties(root, rel, childrel, appinfo);
/*
* We have to copy the parent's quals to the child, with appropriate
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ac5a7c9553..35345ccbe9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -1748,3 +1748,93 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
joinrel->nullable_partexprs[cnt] = nullable_partexpr;
}
}
+
+/*
+ * Initialize some basic properties of child rel from the parent rel, such
+ * target list, equivalence class members, etc.
+ */
+void
+set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo)
+{
+ /*
+ * Copy/Modify targetlist. Even if this child is deemed empty, we need
+ * its targetlist in case it falls on nullable side in a child-join
+ * because of partition-wise join.
+ *
+ * NB: the resulting childrel->reltarget->exprs may contain arbitrary
+ * expressions, which otherwise would not occur in a rel's targetlist.
+ * Code that might be looking at an appendrel child must cope with
+ * such. (Normally, a rel's targetlist would only include Vars and
+ * PlaceHolderVars.) XXX we do not bother to update the cost or width
+ * fields of childrel->reltarget; not clear if that would be useful.
+ */
+ childrel->reltarget->exprs = (List *)
+ adjust_appendrel_attrs(root,
+ (Node *) rel->reltarget->exprs,
+ 1, &appinfo);
+
+ /*
+ * We have to make child entries in the EquivalenceClass data
+ * structures as well. This is needed either if the parent
+ * participates in some eclass joins (because we will want to consider
+ * inner-indexscan joins on the individual children) or if the parent
+ * has useful pathkeys (because we should try to build MergeAppend
+ * paths that produce those sort orderings). Even if this child is
+ * deemed dummy, it may fall on nullable side in a child-join, which
+ * in turn may participate in a MergeAppend, where we will need the
+ * EquivalenceClass data structures.
+ */
+ if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
+ add_child_rel_equivalences(root, appinfo, rel, childrel);
+ childrel->has_eclass_joins = rel->has_eclass_joins;
+
+ if (rel->part_scheme)
+ {
+ AttrNumber attno;
+
+ /*
+ * We need attr_needed data for building targetlist of a join relation
+ * representing join between matching partitions for partition-wise
+ * join. A given attribute of a child will be needed in the same
+ * highest joinrel where the corresponding attribute of parent is
+ * needed. Hence it suffices to use the same Relids set for parent and
+ * child.
+ */
+ for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ {
+ int index = attno - rel->min_attr;
+ Relids attr_needed = rel->attr_needed[index];
+
+ /* System attributes do not need translation. */
+ if (attno <= 0)
+ {
+ Assert(rel->min_attr == childrel->min_attr);
+ childrel->attr_needed[index] = attr_needed;
+ }
+ else
+ {
+ Var *var = list_nth_node(Var,
+ appinfo->translated_vars,
+ attno - 1);
+ int child_index;
+
+ /*
+ * Ignore any column dropped from the parent. Corresponding
+ * Var won't have any translation. It won't have attr_needed
+ * information, since it can not be referenced in the query.
+ */
+ if (var == NULL)
+ {
+ Assert(attr_needed == NULL);
+ continue;
+ }
+
+ child_index = var->varattno - childrel->min_attr;
+ childrel->attr_needed[child_index] = attr_needed;
+ }
+ }
+ }
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index ef7173fbf8..142eecd733 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -301,5 +301,9 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo, JoinType jointype);
+extern void set_basic_child_rel_properties(PlannerInfo *root,
+ RelOptInfo *rel,
+ RelOptInfo *childrel,
+ AppendRelInfo *appinfo);
#endif /* PATHNODE_H */
--
2.11.0
v24-0004-More-refactoring-around-partitioned-table-Append.patchtext/plain; charset=UTF-8; name=v24-0004-More-refactoring-around-partitioned-table-Append.patchDownload
From 5227fd28cc890e3595c78dfabf82f9b3b5e02fe2 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v24 4/5] More refactoring around partitioned table AppendPath
creation
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
---
src/backend/optimizer/path/allpaths.c | 120 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 19 ++++--
src/backend/optimizer/util/relnode.c | 14 ++++
src/include/nodes/relation.h | 25 ++++++-
4 files changed, 122 insertions(+), 56 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd68374e20..8f761a77e8 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -861,6 +861,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ List *rel_appinfos = NIL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -874,6 +875,27 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ {
+ int i;
+
+ for (i = 0; i < rel->nparts; i++)
+ rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -894,7 +916,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
nattrs = rel->max_attr - rel->min_attr + 1;
parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -907,10 +929,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
ListCell *childvars;
ListCell *lc;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1090,6 +1108,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
/* We have at least one live child. */
has_live_children = true;
+ /* Add this child as a live partition of the parent. */
+ rel->live_part_appinfos = lappend(rel->live_part_appinfos, appinfo);
+
/*
* If any live child is not parallel-safe, treat the whole appendrel
* as not parallel-safe. In future we might be able to generate plans
@@ -1186,24 +1207,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
- List *live_childrels = NIL;
+ List *rel_appinfos = NIL,
+ *live_childrels = NIL;
ListCell *l;
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ {
+ foreach (l, root->append_rel_list)
+ {
+ AppendRelInfo *appinfo = lfirst(l);
+
+ /* append_rel_list contains all append rels; ignore others */
+ if (appinfo->parent_relid == parentRTindex)
+ rel_appinfos = lappend(rel_appinfos, appinfo);
+ }
+ }
+ else
+ rel_appinfos = rel->live_part_appinfos;
+
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach(l, rel_appinfos)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
- /* append_rel_list contains all append rels; ignore others */
- if (appinfo->parent_relid != parentRTindex)
- continue;
-
/* Re-locate the child RTE and RelOptInfo */
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
@@ -1267,44 +1299,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1322,17 +1349,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ list_copy(childrel->live_partitioned_rels));
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2a4e22b6c8..a81fed6d1d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5839,14 +5839,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 35345ccbe9..f3b9a2be32 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->part_appinfos = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_part_appinfos = NIL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -233,8 +236,12 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
int cnt_parts = 0;
if (nparts > 0)
+ {
+ rel->part_appinfos = (AppendRelInfo **)
+ palloc(sizeof(AppendRelInfo *) * nparts);
rel->part_rels = (RelOptInfo **)
palloc(sizeof(RelOptInfo *) * nparts);
+ }
foreach(l, root->append_rel_list)
{
@@ -258,6 +265,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
* also match the PartitionDesc. See expand_partitioned_rtentry.
*/
Assert(cnt_parts < nparts);
+ rel->part_appinfos[cnt_parts] = appinfo;
rel->part_rels[cnt_parts] = childrel;
cnt_parts++;
}
@@ -567,9 +575,12 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_part_appinfos = NIL;
+ joinrel->live_partitioned_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -734,9 +745,12 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->part_appinfos = NULL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_part_appinfos = NIL;
+ joinrel->live_partitioned_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6bf68f31da..25333c5407 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -529,8 +529,12 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * part_appinfos - AppendRelInfo of each partition
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * live_part_appinfos - AppendRelInfo of unpruned partitions
+ * live_partitioned_rels - RT indexes of unpruned partitions that are
+ * partitioned tables themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -657,10 +661,27 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
- struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
- * stored in the same order of bounds */
+ struct AppendRelInfo **part_appinfos; /* Array of AppendRelInfos of
+ * of partitioned, stored in the
+ * same order as of bounds */
+ struct RelOptInfo **part_rels; /* Array of RelOptInfos of *all*
+ * partitions, stored in the same order as
+ * of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+
+ /*
+ * List of AppendRelInfo's of the table's partitions that survive a
+ * query's clauses.
+ */
+ List *live_part_appinfos;
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
v24-0005-Teach-planner-to-use-get_partitions_from_clauses.patchtext/plain; charset=UTF-8; name=v24-0005-Teach-planner-to-use-get_partitions_from_clauses.patchDownload
From 6d66b8fe75970c1f36881e584205f9f7b354cc7b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 12 Dec 2017 16:17:10 +0900
Subject: [PATCH v24 5/5] Teach planner to use get_partitions_from_clauses()
Current method of selecting a table's partitions to be scanned
involves applying constraint exclusion against the partition
constraint of each partition, which works by comparing a query's
clauses against the partition constraint and exclude a partition if
the clauses refute the latter. A dummy path is added for each
partition that is excluded. This algorithm takes linear time with a
big constant, especially given that we repeat the work of matching
clauses to the partition constraint for every partition.
Instead, we can match clauses only once by comparing them against
the (parent) table's partition key using populate_partition_clauses().
Then, if we pass the clauses to get_partitions_from_clauses(), we'll
get the set of matching partitions in much less time than determining
by running the matching algorithm separately for each partition.
Authors: Amit Langote,
Dilip Kumar (dilipbalaut@gmail.com),
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/optimizer/path/allpaths.c | 80 ++++-
src/backend/optimizer/path/joinrels.c | 24 ++
src/backend/optimizer/util/plancat.c | 41 ++-
src/include/nodes/relation.h | 6 +-
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 430 +++++++++++++++++++++++---
src/test/regress/sql/partition_prune.sql | 77 ++++-
7 files changed, 592 insertions(+), 76 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8f761a77e8..f6a4e3dc2f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
@@ -136,6 +137,9 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels);
+static List *get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte);
/*
@@ -847,6 +851,77 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
}
/*
+ * get_append_rel_partitions
+ * Returns a List of AppendRelInfo belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+static List *
+get_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel,
+ RangeTblEntry *rte)
+{
+ List *result = NIL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (!clauses)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = lappend(result, rel->part_appinfos[i]);
+ }
+ else
+ {
+ Relation partrel;
+ Bitmapset *partindexes;
+ PartitionClauseInfo *partclauseinfo;
+
+ partrel = heap_open(rte->relid, NoLock);
+
+ /* process clauses and generate the partclauseinfo */
+ partclauseinfo = generate_partition_clauses(partrel, rel->relid,
+ clauses);
+
+ if (!partclauseinfo->constfalse)
+ {
+ PartitionPruneContext context;
+
+ context.rt_index = rel->relid;
+ context.relation = partrel;
+ context.clauseinfo = partclauseinfo;
+
+ partindexes = get_partitions_from_clauses(&context);
+
+ /* Fetch the partition appinfos. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ {
+ AppendRelInfo *appinfo = rel->part_appinfos[i];
+
+#ifdef USE_ASSERT_CHECKING
+ PartitionDesc partdesc = RelationGetPartitionDesc(partrel);
+ RangeTblEntry *childrte;
+
+ childrte = planner_rt_fetch(appinfo->child_relid, root);
+
+ /*
+ * Must be the intended child's RTE here, because appinfos are ordered
+ * the same way as partitions in the partition descriptor.
+ */
+ Assert(partdesc->oids[i] == childrte->relid);
+#endif
+ result = lappend(result, appinfo);
+ }
+ }
+
+ heap_close(partrel, NoLock);
+ }
+
+ return result;
+}
+
+/*
* set_append_rel_size
* Set size estimates for a simple "append relation"
*
@@ -888,10 +963,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- int i;
-
- for (i = 0; i < rel->nparts; i++)
- rel_appinfos = lappend(rel_appinfos, rel->part_appinfos[i]);
+ rel_appinfos = get_append_rel_partitions(root, rel, rte);
rel->live_partitioned_rels = list_make1_int(rti);
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a35d068911..6949886e46 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1395,6 +1395,30 @@ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
child_rel2->relids);
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
+ if (IS_SIMPLE_REL(child_rel1) && child_rel1->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel1->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel1, child_rel1, appinfo);
+ mark_dummy_rel(child_rel1);
+ }
+
+ if (IS_SIMPLE_REL(child_rel2) && child_rel2->pathlist == NIL)
+ {
+ AppendRelInfo *appinfo = rel2->part_appinfos[cnt_parts];
+
+ set_basic_child_rel_properties(root, rel2, child_rel2, appinfo);
+ mark_dummy_rel(child_rel2);
+ }
+
+ /*
* Construct restrictions applicable to the child join from those
* applicable to the parent join.
*/
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..c1d4c7db5b 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1930,6 +1939,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 25333c5407..5e1d4151c2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -342,6 +342,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -349,7 +352,8 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *parttypcoll; /* OIDs of partition key type collation. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
int16 *parttyplen;
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..bc9ff38253 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,355 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..b7c5abf378 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,79 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
Hi David.
On 2018/02/01 8:57, David Rowley wrote:
On 31 January 2018 at 21:03, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Update patch set attached. Thanks again.
(My apologies for being slow to respond here. I've been on leave this
week and I'm off again next week. I now have a little time to reply)
No worries.
Thanks for incorporating my changes into the patchset. A while ago I
was rebasing the run-time pruning patch on top of this but ran into a
few problems which are all results of my changes.1. remove_redundant_clauses collapses the PartClause list into the
most restrictive set of clauses. This disallows multiple evaluations
of the PartitionClauseInfo during runtime pruning. I've written a
proposed fix for this and attached it.
I've incorporated it in the latest patch I posted today.
2. PartitionClauseInfo->keyclauses is a list of PartClause which is
not a node type. This will cause _copyPartitionClauseInfo() to fail.I'm still not quite sure the best way to fix #2 since PartClause
contains a FmgrInfo. I do have a local fix which moves PartClause to
primnodes.h and makes it a proper node type. I also added a copy
function which does not copy any of the cache fields in PartClause. It
just sets valid_cache to false. I didn't particularly think this was
the correct fix. I just couldn't think of how exactly this should be
done at the time.The attached patch also adds the missing nodetag from
PartitionClauseInfo and also fixes up other code so as we don't memset
the node memory to zero, as that would destroy the tag. I ended up
just having extract_partition_key_clauses do the makeNode call. This
also resulted in populate_partition_clauses being renamed to
generate_partition_clauses
I started wondering if it's not such a good idea to make
PartitionClauseInfo a Node at all? I went back to your earlier message
[1]: /messages/by-id/CAKJS1f8T_efuAgPWtyGdfwD1kBLR-giFvoez7raYsQ4P1i2OYw@mail.gmail.com
to use, but it doesn't sound nice that we'd be putting into the plan
something that's looks more like scratchpad for the partition.c code. I
think we should try to keep PartitionClauseInfo in partition.h and put
only the list of matched bare clauses into Append.
Thanks,
Amit
[1]: /messages/by-id/CAKJS1f8T_efuAgPWtyGdfwD1kBLR-giFvoez7raYsQ4P1i2OYw@mail.gmail.com
/messages/by-id/CAKJS1f8T_efuAgPWtyGdfwD1kBLR-giFvoez7raYsQ4P1i2OYw@mail.gmail.com
On Fri, Feb 2, 2018 at 1:04 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Your proposed cleanup sounds much better, so I implemented it in the
attached updated 0001, while dropping the previously proposed
PartitionBoundCmpArg approach.Updated set of patches attached (patches 0002 onward mostly unchanged,
except I incorporated the delta patch posted by David upthread).
Committed 0001. Thanks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Feb 2, 2018 at 9:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Updated set of patches attached (patches 0002 onward mostly unchanged,
except I incorporated the delta patch posted by David upthread).Committed 0001. Thanks.
Some preliminary thoughts...
Regarding 0002, I can't help noticing that this adds a LOT of new code
to partition.c. With 0002 applied, it climbs into the top 2% of all
".c" files in terms of lines of code. It seems to me, though, that
maybe it would make sense to instead add all of this code to some new
file .c file, e.g. src/backend/optimizer/util/partprune.c. I realize
that's a little awkward in this case because we're hoping to use this
code at runtime and not just in the optimizer, but I don't have a
better idea. Using src/backend/catalog as a dumping-ground for code
that doesn't have a clear-cut place to live is not a superior
alternative, for sure. And it seems to me that the code you're adding
here is really quite similar to what we've already got in that
directory -- for example, predtest.c, which currently does partition
pruning, lives there; so does clauses.c, whose evaluate_expr facility
this patch wants to use; so does relnode.c, which the other patches
modify; and in general this looks an awful lot like other optimizer
logic that decomposes clauses. I'm open to other suggestions but I
don't think adding all of this directly into partition.c is a good
plan.
If we do add a new file for this code, the header comment for that
file would be a good place to write an overall explanation of this new
facility. The individual bits and pieces seem to have good comments
but an overall explanation of what's going on here seems to be
lacking.
It doesn't seem good that get_partitions_from_clauses requires us to
reopen the relation. I'm going to give my standard review feedback
any time someone injects additional relation_open or heap_open calls
into a patch: please look for a way to piggyback on one of the places
that already has the relation open instead of adding code to re-open
it elsewhere. Reopening it is not entirely free, and, especially when
NoLock is used, makes it hard to tell whether we're doing the locking
correctly. Note that we've already got things like
set_relation_partition_info (which copies the bounds) and
set_baserel_partition_key_exprs (which copies, with some partitioning,
the partitioning expressions) and also find_partition_scheme, but
instead of using that existing data from the RelOptInfo, this patch is
digging it directly out of the relcache entry, which doesn't seem
great.
The changes to set_append_rel_pathlist probably make it slower in the
case where rte->relkind != RELKIND_PARTITIONED_TABLE. We build a
whole new list that we don't really need. How about just keeping the
(appinfo->parent_relid != parentRTindex) test in the loop and setting
rel_appinfos to either root->append_rel_list or
rel->live_part_appinfos as appropriate?
I understand why COLLATION_MATCH think that a collation OID match is
OK, but why is InvalidOid also OK? Can you add a comment? Maybe some
test cases, too?
How fast is this patch these days, compared with the current approach?
It would be good to test both when nearly all of the partitions are
pruned and when almost none of the partitions are pruned.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018/02/03 6:05, Robert Haas wrote:
On Fri, Feb 2, 2018 at 9:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Updated set of patches attached (patches 0002 onward mostly unchanged,
except I incorporated the delta patch posted by David upthread).Committed 0001. Thanks.
Some preliminary thoughts...
Thanks for the review.
Regarding 0002, I can't help noticing that this adds a LOT of new code
to partition.c. With 0002 applied, it climbs into the top 2% of all
".c" files in terms of lines of code. It seems to me, though, that
maybe it would make sense to instead add all of this code to some new
file .c file, e.g. src/backend/optimizer/util/partprune.c. I realize
that's a little awkward in this case because we're hoping to use this
code at runtime and not just in the optimizer, but I don't have a
better idea. Using src/backend/catalog as a dumping-ground for code
that doesn't have a clear-cut place to live is not a superior
alternative, for sure.
Agreed. partition.c has gotten quite big and actually more than half of
the code that this patch adds really seems to belong outside of it.
And it seems to me that the code you're adding
here is really quite similar to what we've already got in that
directory -- for example, predtest.c, which currently does partition
pruning, lives there; so does clauses.c, whose evaluate_expr facility
this patch wants to use; so does relnode.c, which the other patches
modify; and in general this looks an awful lot like other optimizer
logic that decomposes clauses. I'm open to other suggestions but I
don't think adding all of this directly into partition.c is a good
plan.
Agreed.
A partprune.c in the optimizer's util directory seems like a good place.
If we do add a new file for this code, the header comment for that
file would be a good place to write an overall explanation of this new
facility. The individual bits and pieces seem to have good comments
but an overall explanation of what's going on here seems to be
lacking.
OK, I will add such a comment.
It doesn't seem good that get_partitions_from_clauses requires us to
reopen the relation. I'm going to give my standard review feedback
any time someone injects additional relation_open or heap_open calls
into a patch: please look for a way to piggyback on one of the places
that already has the relation open instead of adding code to re-open
it elsewhere. Reopening it is not entirely free, and, especially when
NoLock is used, makes it hard to tell whether we're doing the locking
correctly. Note that we've already got things like
set_relation_partition_info (which copies the bounds) and
set_baserel_partition_key_exprs (which copies, with some partitioning,
the partitioning expressions) and also find_partition_scheme, but
instead of using that existing data from the RelOptInfo, this patch is
digging it directly out of the relcache entry, which doesn't seem
great.
OK, I have to admit that the quoted heap_open wasn't quite well thought
out and I'm almost sure that everything should be fine with the
information that set_relation_partition_info() fills in the RelOptInfo.
I'm now going through the patch to try to figure out how to make that work.
The changes to set_append_rel_pathlist probably make it slower in the
case where rte->relkind != RELKIND_PARTITIONED_TABLE. We build a
whole new list that we don't really need. How about just keeping the
(appinfo->parent_relid != parentRTindex) test in the loop and setting
rel_appinfos to either root->append_rel_list or
rel->live_part_appinfos as appropriate?
That's certainly better. Also in set_append_rel_size.
I understand why COLLATION_MATCH think that a collation OID match is
OK, but why is InvalidOid also OK? Can you add a comment? Maybe some
test cases, too?
partcollid == InvalidOid means the partition key is of uncollatable type,
so further checking the collation is unnecessary.
There is a test in partition_prune.sql that covers the failure to prune
when collations don't match for a text partition key.
How fast is this patch these days, compared with the current approach?
It would be good to test both when nearly all of the partitions are
pruned and when almost none of the partitions are pruned.
I will include some performance numbers in my next email, which hopefully
should not be later than Friday this week.
Thanks,
Amit
On Tue, Feb 6, 2018 at 4:55 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I understand why COLLATION_MATCH think that a collation OID match is
OK, but why is InvalidOid also OK? Can you add a comment? Maybe some
test cases, too?partcollid == InvalidOid means the partition key is of uncollatable type,
so further checking the collation is unnecessary.
Yeah, but in that case wouldn't BOTH OIDs be InvalidOid, and thus the
equality test would mach anyway?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Feb 6, 2018 at 3:25 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Agreed. partition.c has gotten quite big and actually more than half of
the code that this patch adds really seems to belong outside of it.And it seems to me that the code you're adding
here is really quite similar to what we've already got in that
directory -- for example, predtest.c, which currently does partition
pruning, lives there; so does clauses.c, whose evaluate_expr facility
this patch wants to use; so does relnode.c, which the other patches
modify; and in general this looks an awful lot like other optimizer
logic that decomposes clauses. I'm open to other suggestions but I
don't think adding all of this directly into partition.c is a good
plan.Agreed.
A partprune.c in the optimizer's util directory seems like a good place.
partprune.c looks to much tied to one feature. I am sure that the
functions used for partition pruning can be used by other
optimizations as well.
partition.c seems to have two kinds of functions 1. that build and
manage relcache, creates quals from bounds etc. which are metadata
management kind 2. partition bound comparison functions, and other
optimizer related functions. May be we should divide the file that
way. The first category code remains in catalog/ as it is today. The
second catagory functions move to optimizer/.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Wed, Feb 7, 2018 at 3:42 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
partprune.c looks to much tied to one feature. I am sure that the
functions used for partition pruning can be used by other
optimizations as well.
Uh, I don't know about that, this code looks like it does partition
pruning specifically, and nothing else. How else do you think it
could be used?
partition.c seems to have two kinds of functions 1. that build and
manage relcache, creates quals from bounds etc. which are metadata
management kind 2. partition bound comparison functions, and other
optimizer related functions. May be we should divide the file that
way. The first category code remains in catalog/ as it is today. The
second catagory functions move to optimizer/.
It would be sensible to separate functions that build and manage data
in the relcache from other functions. I think we should consider
moving the existing functions of that type from partition.c to
src/backend/utils/cache/partcache.c.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Feb 7, 2018 at 6:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 7, 2018 at 3:42 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:partprune.c looks to much tied to one feature. I am sure that the
functions used for partition pruning can be used by other
optimizations as well.Uh, I don't know about that, this code looks like it does partition
pruning specifically, and nothing else. How else do you think it
could be used?
I didn't have any specific thing in mind, and now more I think, it
looks less likely that it will be used for something else.
While looking at the changes in partition.c I happened to look at the
changes in try_partition_wise_join(). They mark partitions deemed
dummy by pruning as dummy relations. If we accept those changes, we
could very well change the way we handle dummy partitioned tables,
which would mean that we also revert the recent commit
f069c91a5793ff6b7884120de748b2005ee7756f. But I guess, those changes
haven't been reviewed yet and so not final.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Wed, Feb 7, 2018 at 8:37 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
While looking at the changes in partition.c I happened to look at the
changes in try_partition_wise_join(). They mark partitions deemed
dummy by pruning as dummy relations. If we accept those changes, we
could very well change the way we handle dummy partitioned tables,
which would mean that we also revert the recent commit
f069c91a5793ff6b7884120de748b2005ee7756f. But I guess, those changes
haven't been reviewed yet and so not final.
Well, if you have an opinion on those proposed changes, I'd like to hear it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018/02/07 1:36, Robert Haas wrote:
On Tue, Feb 6, 2018 at 4:55 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I understand why COLLATION_MATCH think that a collation OID match is
OK, but why is InvalidOid also OK? Can you add a comment? Maybe some
test cases, too?partcollid == InvalidOid means the partition key is of uncollatable type,
so further checking the collation is unnecessary.Yeah, but in that case wouldn't BOTH OIDs be InvalidOid, and thus the
equality test would mach anyway?
It seems that that's not necessarily true. I remember to have copied that
logic from the following macro in indxpath.c:
#define IndexCollMatchesExprColl(idxcollation, exprcollation) \
((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
which was added by the following commit:
commit cb37c291060dd13b1a8ff61fceee09efcfbc34e1
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Sep 29 00:43:42 2011 -0400
Fix index matching for operators with mixed collatable/noncollatable inputs.
If an indexable operator for a non-collatable indexed datatype has a
collatable right-hand input type, any OpExpr for it will be marked with a
nonzero inputcollid (since having one collatable input is sufficient to
make that happen). However, an index on a non-collatable column certainly
doesn't have any collation. This caused us to fail to match such
operators to their indexes, because indxpath.c required an exact match of
index collation and clause collation. It seems correct to allow a match
when the index is collation-less regardless of the clause's inputcollid:
an operator with both noncollatable and collatable inputs could perhaps
depend on the collation of the collatable input, but it could hardly
expect the index for the noncollatable input to have that same collation.
[ ... ]
+ * If the index has a collation, the clause must have the same collation.
+ * For collation-less indexes, we assume it doesn't matter; this is
+ * necessary for cases like "hstore ? text", wherein hstore's operators
+ * don't care about collation but the clause will get marked with a
+ * collation anyway because of the text argument. (This logic is
+ * embodied in the macro IndexCollMatchesExprColl.)
+ *
Discussion leading to the above commit occurred here:
/messages/by-id/201109282050.p8SKoA4O084649@wwwmaster.postgresql.org
It seems that we can think similarly for partitioning and the let the
partition pruning proceed with a clause even if the partition key is
non-collatable whereas the clause's other argument is collatable. Even
though it seems we don't yet allow the kind of partitioning that would
lead to such a situation.
Thanks,
Amit
Robert Haas wrote:
On Wed, Feb 7, 2018 at 3:42 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
partition.c seems to have two kinds of functions 1. that build and
manage relcache, creates quals from bounds etc. which are metadata
management kind 2. partition bound comparison functions, and other
optimizer related functions. May be we should divide the file that
way. The first category code remains in catalog/ as it is today. The
second catagory functions move to optimizer/.It would be sensible to separate functions that build and manage data
in the relcache from other functions. I think we should consider
moving the existing functions of that type from partition.c to
src/backend/utils/cache/partcache.c.
FWIW I've been thinking that perhaps we need some other separation of
code better than statu quo. The current partition.c file includes stuff
for several modules and ISTM all these new patches are making more and
more of a mess. So +1 to the general idea of splitting things up.
Maybe partcache.c is not ambitious enough, but it seems a good first
step.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Feb 7, 2018 at 7:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 7, 2018 at 8:37 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:While looking at the changes in partition.c I happened to look at the
changes in try_partition_wise_join(). They mark partitions deemed
dummy by pruning as dummy relations. If we accept those changes, we
could very well change the way we handle dummy partitioned tables,
which would mean that we also revert the recent commit
f069c91a5793ff6b7884120de748b2005ee7756f. But I guess, those changes
haven't been reviewed yet and so not final.Well, if you have an opinion on those proposed changes, I'd like to hear it.
I am talking about changes after this comment
/*
+ * If either child_rel1 or child_rel2 is not a live partition, they'd
+ * not have been touched by set_append_rel_size. So, its RelOptInfo
+ * would be missing some information that set_append_rel_size sets for
+ * live partitions, such as the target list, child EQ members, etc.
+ * We need to make the RelOptInfo of even the dead partitions look
+ * minimally valid and as having a valid dummy path attached to it.
+ */
There are couple of problems with this change
1. An N way join may call try_partition_wise_join() with the same base
relation on one side N times. The condition will be tried those many
times.
2. We will have to adjust or make similar changes in
try_partition_wise_aggregate() proposed in the partition-wise
aggregate patch. Right now it checks if the relation is dummy but it
will have to check whether the pathlist is also NULL. Any
partition-wise operation that we try in future will need this
adjustment.
AFAIU, for pruned partitions, we don't set necessary properties of the
corresponding RelOptInfo when it is pruned. If we were sure that we
will not use that RelOptInfo anywhere in the rest of the planning,
this would work. But that's not the case. AFAIU, current planner
assumes that a relation which has not been eliminated before planning
(DEAD relation), but later proved to not contribute any rows in the
result, is marked dummy. Partition pruning breaks that assumption and
thus may have other side effects, that we do not see right now. We
have similar problem with dummy partitioned tables, but we have code
in place to avoid looking at the pathlists of their children by not
considering such a partitioned table as partitioned. May be we want to
change that too.
Either we add refactoring patches to change the planner so that it
doesn't assume something like that OR we make sure that the pruned
partition's RelOptInfo have necessary properties and a dummy pathlist
set. I will vote for second. We spend CPU cycles marking pruned
partitions as dummy if the dummy pathlist is never used. May be we can
avoid setting dummy pathlist if we can detect that a pruned partition
is guaranteed not to be used, e.g when the corresponding partitioned
relation does not participate in any join or other upper planning.
Apart from that another change that caught my eye is
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.
Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.
That was voted down by Robert during partition-wise join
implementation. And I agree with him. Any changes around changing that
should change the way we handle AppendRelInfos for all relations, not
just (declarative) partitioned relations.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Hi Ashutosh.
On 2018/02/09 14:09, Ashutosh Bapat wrote:
On Wed, Feb 7, 2018 at 7:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 7, 2018 at 8:37 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:While looking at the changes in partition.c I happened to look at the
changes in try_partition_wise_join(). They mark partitions deemed
dummy by pruning as dummy relations. If we accept those changes, we
could very well change the way we handle dummy partitioned tables,
which would mean that we also revert the recent commit
f069c91a5793ff6b7884120de748b2005ee7756f. But I guess, those changes
haven't been reviewed yet and so not final.Well, if you have an opinion on those proposed changes, I'd like to hear it.
I am talking about changes after this comment /* + * If either child_rel1 or child_rel2 is not a live partition, they'd + * not have been touched by set_append_rel_size. So, its RelOptInfo + * would be missing some information that set_append_rel_size sets for + * live partitions, such as the target list, child EQ members, etc. + * We need to make the RelOptInfo of even the dead partitions look + * minimally valid and as having a valid dummy path attached to it. + */There are couple of problems with this change
1. An N way join may call try_partition_wise_join() with the same base
relation on one side N times. The condition will be tried those many
times.2. We will have to adjust or make similar changes in
try_partition_wise_aggregate() proposed in the partition-wise
aggregate patch. Right now it checks if the relation is dummy but it
will have to check whether the pathlist is also NULL. Any
partition-wise operation that we try in future will need this
adjustment.AFAIU, for pruned partitions, we don't set necessary properties of the
corresponding RelOptInfo when it is pruned. If we were sure that we
will not use that RelOptInfo anywhere in the rest of the planning,
this would work. But that's not the case. AFAIU, current planner
assumes that a relation which has not been eliminated before planning
(DEAD relation), but later proved to not contribute any rows in the
result, is marked dummy. Partition pruning breaks that assumption and
thus may have other side effects, that we do not see right now. We
have similar problem with dummy partitioned tables, but we have code
in place to avoid looking at the pathlists of their children by not
considering such a partitioned table as partitioned. May be we want to
change that too.Either we add refactoring patches to change the planner so that it
doesn't assume something like that OR we make sure that the pruned
partition's RelOptInfo have necessary properties and a dummy pathlist
set. I will vote for second. We spend CPU cycles marking pruned
partitions as dummy if the dummy pathlist is never used. May be we can
avoid setting dummy pathlist if we can detect that a pruned partition
is guaranteed not to be used, e.g when the corresponding partitioned
relation does not participate in any join or other upper planning.
Thanks for the analysis. I agree with all the points of concern. so for
now, I have dropped all the changes from my patch that give rise to the
concerns. With the new patch, changes to the existing optimizer code
beside introducing partprune.c in the util directory are pretty thin:
git diff master --stat src/backend/optimizer/
src/backend/optimizer/path/allpaths.c | 16 ++
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1421 +++++++++++
src/backend/optimizer/util/plancat.c | 83 ++++---
src/backend/optimizer/util/relnode.c | 8 +
6 files changed, 1504 insertions(+), 30 deletions(-)
So, no refactoring the existing optimizer code, just replacing the
partition pruning mechanism with partprune.c functions.
Apart from that another change that caught my eye is
Instead of going through root->append_rel_list to pick up the child
appinfos, store them in an array called part_appinfos that stores
partition appinfos in the same order as RelOptInfos are stored in
part_rels, right when the latter are created.Further, instead of going through root->pcinfo_list to get the list
of partitioned child rels, which ends up including even the rels
that are pruned by set_append_rel_size(), build up a list of "live"
partitioned child rels and use the same to initialize partitioned_rels
field of AppendPath.That was voted down by Robert during partition-wise join
implementation. And I agree with him. Any changes around changing that
should change the way we handle AppendRelInfos for all relations, not
just (declarative) partitioned relations.
I removed part_appinfos from the patch. Also, I have made the changes
introducing live_partitioned_rels a separate patch, which we can discuss
independently of the pruning changes.
Will post the latest patch set later in the evening.
Thanks,
Amit
On 2018/02/06 18:55, Amit Langote wrote:
How fast is this patch these days, compared with the current approach?
It would be good to test both when nearly all of the partitions are
pruned and when almost none of the partitions are pruned.I will include some performance numbers in my next email, which hopefully
should not be later than Friday this week.
Here is the latest set of patches. I can see about 2x speedup in planning
time for various partition counts, although it grows linearly as the
partition count grows (same as with HEAD). Detailed performance figures
follow.
* Partitioned table schema:
H:
create table ht (a int, b int) partition by hash (b);
create table ht_* partition of ht for values with (modulus N, ...)
L:
create table lt (a int, b int) partition by list (b);
create table lt_1 partition of lt for values in (1)
..
create table lt_N partition of lt for values in (N)
R:
create table rt (a int, b int) partition by range (b);
create table rt_1 partition of rt for values from (1) to (<step>)
..
create table rt_N partition of rt for values in (N-1 * <step>) to (N * <step>)
* Queries
Prunes every partition but 1: select * from table_name where b = 1
Prunes none: select * from table_name where b >= 1
* Planning time in milliseconds (average of 5 runs).
On HEAD:
parts H-prune H-noprune L-prune L-noprune R-prune R-noprune
8 1.50 1.42 1.60 1.55 1.77 1.75
16 2.49 2.37 2.32 2.65 3.29 3.07
32 3.96 4.49 3.83 4.14 5.06 5.70
64 8.02 7.51 7.14 7.34 9.37 10.02
128 14.47 14.19 13.31 13.99 18.09 18.86
256 24.76 27.63 25.59 27.87 34.15 37.19
512 50.36 55.92 52.56 54.76 69.34 72.55
1024 102.94 110.59 104.97 110.41 136.89 146.54
Patched:
parts H-prune H-noprune L-prune L-noprune R-prune R-noprune
8 1.49 0.90 0.87 0.74 0.84 1.09
16 2.01 1.50 1.42 1.68 1.42 1.41
32 2.63 2.47 2.08 2.69 2.73 2.81
64 5.62 4.66 4.45 4.96 4.92 5.08
128 11.28 9.65 9.00 9.60 8.68 9.91
256 18.36 18.49 17.11 18.39 17.47 18.43
512 33.88 36.89 34.06 36.52 34.01 37.26
1024 66.40 72.75 66.37 73.40 67.06 67.06
Attached v25 patches.
0001-Modify-bound-comparision-functions-to-accept-mem.patch
This is Ashutosh's patch that he posted on the "advanced partition
matching algorithm for partition-wise join" thread.
0002-Refactor-partition-bound-search-functions.patch
This is similar to 0001. Whereas 0001 modifies just the comparison
functions, this one modifies the partition bound search functions, because
the main pruning patch uses the search functions.
0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patch
This adds some of the fields to PartitionScheme that were needed by the
main pruning patch.
The above 3 patches do what they do, because we'd like the main pruning to
patch to add its functionality by relying on whatever information is made
available in the partitioned table's RelOptInfo.
0004-Faster-partition-pruning.patch
The main patch that adds src/backend/optimizer/util/partprune.c, a module
to provide the functionality that will replace the current approach of
calling relation_excluded_by_constraints() for each partition.
Sorry, but there is still this big TODO here, which I'll try to fix early
next week.
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * TODO: write a longer description of things in this file
0005-Add-only-unpruned-partitioned-child-rels-to-part.patch
This one teaches the planner to put *only* the un-pruned partitioned child
tables into partitioned_rels list of certain plan nodes.
Thanks,
Amit
[1]: /messages/by-id/CAFjFpRctst136uN2BvbWLAkS7w278XmKY4_PUB+xk-+NezNq8g@mail.gmail.com
/messages/by-id/CAFjFpRctst136uN2BvbWLAkS7w278XmKY4_PUB+xk-+NezNq8g@mail.gmail.com
Attachments:
v25-0001-Modify-bound-comparision-functions-to-accept-mem.patchtext/plain; charset=UTF-8; name=v25-0001-Modify-bound-comparision-functions-to-accept-mem.patchDownload
From 2dbfb3498d86acb3b77c6c8c2e15013d64fb7372 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 6 Jul 2017 14:15:22 +0530
Subject: [PATCH v25 1/5] Modify bound comparision functions to accept members
of PartitionKey
Functions partition_rbound_cmp() and partition_rbound_datum_cmp() are
required to merge partition bounds from joining relations. While doing
so, we do not have access to the PartitionKey of either relations. So,
modify these functions to accept only required members of PartitionKey
so that the functions can be reused for merging bounds.
Ashutosh Bapat.
---
src/backend/catalog/partition.c | 53 ++++++++++++++++++++++++++++-------------
1 file changed, 36 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 31c80c7f1a..2a64757584 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -165,10 +165,12 @@ static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
List *datums, bool lower);
static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
int remainder2);
-static int32 partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation, Datum *datums1,
+ PartitionRangeDatumKind *kind1, bool lower1,
+ PartitionRangeBound *b2);
+static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
@@ -1116,8 +1118,9 @@ check_new_partition_bound(char *relname, Relation parent,
* First check if the resulting range would be empty with
* specified lower and upper bounds
*/
- if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
- upper) >= 0)
+ if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, lower->datums,
+ lower->kind, true, upper) >= 0)
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -1177,7 +1180,10 @@ check_new_partition_bound(char *relname, Relation parent,
kind = boundinfo->kind[offset + 1];
is_lower = (boundinfo->indexes[offset + 1] == -1);
- cmpval = partition_rbound_cmp(key, datums, kind,
+ cmpval = partition_rbound_cmp(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ datums, kind,
is_lower, upper);
if (cmpval < 0)
{
@@ -2814,7 +2820,9 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
PartitionKey key = (PartitionKey) arg;
- return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+ return partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, b1->datums, b1->kind,
+ b1->lower, b2);
}
/*
@@ -2823,6 +2831,10 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* Return for two range bounds whether the 1st one (specified in datums1,
* kind1, and lower1) is <, =, or > the bound specified in *b2.
*
+ * partnatts, partsupfunc and partcollation give number of attributes in the
+ * bounds to be compared, comparison function to be used and the collations of
+ * attributes resp.
+ *
* Note that if the values of the two range bounds compare equal, then we take
* into account whether they are upper or lower bounds, and an upper bound is
* considered to be smaller than a lower bound. This is important to the way
@@ -2831,7 +2843,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* two contiguous partitions.
*/
static int32
-partition_rbound_cmp(PartitionKey key,
+partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
Datum *datums1, PartitionRangeDatumKind *kind1,
bool lower1, PartitionRangeBound *b2)
{
@@ -2841,7 +2853,7 @@ partition_rbound_cmp(PartitionKey key,
PartitionRangeDatumKind *kind2 = b2->kind;
bool lower2 = b2->lower;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < partnatts; i++)
{
/*
* First, handle cases where the column is unbounded, which should not
@@ -2862,8 +2874,8 @@ partition_rbound_cmp(PartitionKey key,
*/
break;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
datums1[i],
datums2[i]));
if (cmpval != 0)
@@ -2887,9 +2899,14 @@ partition_rbound_cmp(PartitionKey key,
*
* Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
* is <, =, or > partition key of tuple (tuple_datums)
+ *
+ * n_tuple_datums, partsupfunc and partcollation give number of attributes in
+ * the bounds to be compared, comparison function to be used and the collations
+ * of attributes resp.
+ *
*/
static int32
-partition_rbound_datum_cmp(PartitionKey key,
+partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
{
@@ -2903,8 +2920,8 @@ partition_rbound_datum_cmp(PartitionKey key,
else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
return 1;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
rb_datums[i],
tuple_datums[i]));
if (cmpval != 0)
@@ -2981,7 +2998,8 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key,
+ cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3025,7 +3043,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key,
+ cmpval = partition_rbound_datum_cmp(key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
--
2.11.0
v25-0002-Refactor-partition-bound-search-functions.patchtext/plain; charset=UTF-8; name=v25-0002-Refactor-partition-bound-search-functions.patchDownload
From c5ea310602669dd573d9664f62c916ed3960d0f7 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 8 Feb 2018 19:08:12 +0900
Subject: [PATCH v25 2/5] Refactor partition bound search functions
Remove the PartitionKey argument from their signature and instead
add provide the necessary information through other arguments.
---
src/backend/catalog/partition.c | 75 +++++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 2a64757584..dccaa232a9 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -174,22 +174,24 @@ static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
-static int partition_list_bsearch(PartitionKey key,
+static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal);
-static int partition_range_bsearch(PartitionKey key,
+static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(PartitionKey key,
+static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
+static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
@@ -1007,7 +1009,7 @@ check_new_partition_bound(char *relname, Relation parent,
* boundinfo->datums that is less than or equal to the
* (spec->modulus, spec->remainder) pair.
*/
- offset = partition_hash_bsearch(key, boundinfo,
+ offset = partition_hash_bsearch(boundinfo,
spec->modulus,
spec->remainder);
if (offset < 0)
@@ -1083,7 +1085,9 @@ check_new_partition_bound(char *relname, Relation parent,
int offset;
bool equal;
- offset = partition_list_bsearch(key, boundinfo,
+ offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
+ boundinfo,
val->constvalue,
&equal);
if (offset >= 0 && equal)
@@ -1158,7 +1162,10 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_range_bsearch(key, boundinfo, lower,
+ offset = partition_range_bsearch(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ boundinfo, lower,
&equal);
if (boundinfo->indexes[offset + 1] < 0)
@@ -2577,7 +2584,9 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int greatest_modulus = get_greatest_modulus(boundinfo);
- uint64 rowHash = compute_hash_value(key, values, isnull);
+ uint64 rowHash = compute_hash_value(key->partnatts,
+ key->partsupfunc,
+ values, isnull);
part_index = boundinfo->indexes[rowHash % greatest_modulus];
}
@@ -2593,7 +2602,8 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
bool equal = false;
- bound_offset = partition_list_bsearch(key,
+ bound_offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
partdesc->boundinfo,
values[0], &equal);
if (bound_offset >= 0 && equal)
@@ -2622,11 +2632,13 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
- bound_offset = partition_range_datum_bsearch(key,
- partdesc->boundinfo,
- key->partnatts,
- values,
- &equal);
+ bound_offset =
+ partition_range_datum_bsearch(key->partsupfunc,
+ key->partcollation,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
/*
* The bound at bound_offset is less than or equal to the
* tuple value, so the bound at offset+1 is the upper
@@ -2940,7 +2952,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* to the input value.
*/
static int
-partition_list_bsearch(PartitionKey key,
+partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
{
@@ -2955,8 +2967,8 @@ partition_list_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+ partcollation[0],
boundinfo->datums[mid][0],
value));
if (cmpval <= 0)
@@ -2983,7 +2995,8 @@ partition_list_bsearch(PartitionKey key,
* to the input range bound
*/
static int
-partition_range_bsearch(PartitionKey key,
+partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal)
{
@@ -2998,8 +3011,7 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_cmp(partnatts, partsupfunc, partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3028,7 +3040,7 @@ partition_range_bsearch(PartitionKey key,
* to the input tuple.
*/
static int
-partition_range_datum_bsearch(PartitionKey key,
+partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
{
@@ -3043,8 +3055,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
@@ -3071,8 +3083,7 @@ partition_range_datum_bsearch(PartitionKey key,
* all of them are greater
*/
static int
-partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
int lo,
@@ -3270,27 +3281,27 @@ get_greatest_modulus(PartitionBoundInfo bound)
* Compute the hash value for given not null partition key values.
*/
static uint64
-compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
+compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull)
{
int i;
- int nkeys = key->partnatts;
uint64 rowHash = 0;
Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
- for (i = 0; i < nkeys; i++)
+ for (i = 0; i < partnatts; i++)
{
if (!isnull[i])
{
Datum hash;
- Assert(OidIsValid(key->partsupfunc[i].fn_oid));
+ Assert(OidIsValid(partsupfunc[i].fn_oid));
/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
- hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
+ hash = FunctionCall2(&partsupfunc[i], values[i], seed);
/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
--
2.11.0
v25-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchtext/plain; charset=UTF-8; name=v25-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchDownload
From f8455ac9c572ebba26f740f3d200f777b4eee5d7 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v25 3/5] Add parttypid, partcollation, partsupfunc to
PartitionScheme
---
src/backend/optimizer/util/plancat.c | 41 +++++++++++++++++++++++++-----------
src/include/nodes/relation.h | 9 ++++++++
2 files changed, 38 insertions(+), 12 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..dcfc1665a8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1887,22 +1887,26 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
continue;
/* Match the partition key types. */
- if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
+ if (memcmp(partkey->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
/*
- * Length and byval information should match when partopcintype
+ * typlen, typbyval, typcoll information should match when typid
* matches.
*/
Assert(memcmp(partkey->parttyplen, part_scheme->parttyplen,
sizeof(int16) * partnatts) == 0);
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ Assert(memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(bool) * partnatts) == 0);
/* Found matching partition scheme. */
return part_scheme;
@@ -1918,6 +1922,22 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
part_scheme->strategy = partkey->strategy;
part_scheme->partnatts = partkey->partnatts;
+ part_scheme->parttypid = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypid, partkey->parttypid,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
+ memcpy(part_scheme->parttyplen, partkey->parttyplen,
+ sizeof(int16) * partnatts);
+
+ part_scheme->parttypbyval = (bool *) palloc(sizeof(bool) * partnatts);
+ memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
+ sizeof(bool) * partnatts);
+
+ part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ sizeof(Oid) * partnatts);
+
part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
memcpy(part_scheme->partopfamily, partkey->partopfamily,
sizeof(Oid) * partnatts);
@@ -1926,17 +1946,14 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->partopcintype, partkey->partopcintype,
sizeof(Oid) * partnatts);
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
sizeof(Oid) * partnatts);
- part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
- memcpy(part_scheme->parttyplen, partkey->parttyplen,
- sizeof(int16) * partnatts);
-
- part_scheme->parttypbyval = (bool *) palloc(sizeof(bool) * partnatts);
- memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
- sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..ce9975c620 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -342,6 +343,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -350,10 +354,15 @@ typedef struct PartitionSchemeData
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
+ Oid *parttypid;
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Array of partition key comparison function pointers */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v25-0004-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v25-0004-Faster-partition-pruning.patchDownload
From b57956c7c31aa065b349b3b8c2a6a43fd9d6af6f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v25 4/5] Faster partition pruning
Authors: Amit Langote,
Dilip Kumar (dilipbalaut@gmail.com),
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/catalog/partition.c | 669 ++++++++++++
src/backend/nodes/copyfuncs.c | 22 +
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1403 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 85 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/primnodes.h | 33 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 430 +++++++-
src/test/regress/sql/partition_prune.sql | 77 +-
18 files changed, 2761 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dccaa232a9..4648c2c92f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -196,6 +196,15 @@ static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1563,9 +1572,669 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of partitions that can safely be removed due to
+ * 'ne_clauses' containing not-equal clauses for all possible values
+ * that the partition can contain.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /*
+ * If the query is looking for null keys, there can only be one such
+ * partition. Return the same if one exists.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * The bound at minoff is <= minkeys, given the way
+ * partition_bound_bsearch() works. If it's not equal (<), then
+ * increment minoff to make it point to the datum on the right
+ * that necessarily satisfies minkeys. Also do the same if it is
+ * equal but minkeys is exclusive.
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * minkeys is greater than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * The bound at maxoff is <= maxkeys, given the way
+ * partition_bound_bsearch works. If the bound at maxoff exactly
+ * matches maxkey (is_equal), but the maxkey is exclusive, then
+ * decrement maxoff to point to the bound on the left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal,
+ include_def = false;
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering it as the lower bound
+ * of the partition that eqkeys falls into, the bound at eqoff + 1
+ * would be its upper bound, so use eqoff + 1 to get the desired
+ * partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_bound_bsearch would've returned the offset of just one of
+ * those. If minkey is inclusive, we must decrement minoff until it
+ * reaches the leftmost of those bound values, so that partitions
+ * corresponding to all those bound values are selected. If minkeys
+ * is exclusive, we must increment minoff until it reaches the first
+ * bound greater than this prefix, so that none of the partitions
+ * corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff += 1;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, we believe that minoff/maxoff point to the upper bound
+ * of some partition, but it may not be the case. It might actually be
+ * the upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range us unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partnatts || keys->n_maxkeys < partnatts)
+ {
+ for (i = 0; i < partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of indexes of partitions that can safely be removed
+ * due to each such partition's every allowable non-null datum appearing in
+ * a <> opeartor clause.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 82255b0d1d..a3048e46ef 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2132,6 +2132,25 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+static PartitionClauseInfo *
+_copyPartitionClauseInfo(const PartitionClauseInfo *from)
+{
+ PartitionClauseInfo *newnode = makeNode(PartitionClauseInfo);
+
+ int i;
+ for (i = 0; i < PARTITION_MAX_KEYS; i++)
+ COPY_NODE_FIELD(keyclauses[i]);
+
+ COPY_NODE_FIELD(or_clauses);
+ COPY_NODE_FIELD(ne_clauses);
+ COPY_BITMAPSET_FIELD(keyisnull);
+ COPY_BITMAPSET_FIELD(keyisnotnull);
+ COPY_SCALAR_FIELD(constfalse);
+ COPY_SCALAR_FIELD(foundkeyclauses);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5020,6 +5039,9 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionClauseInfo:
+ retval = _copyPartitionClauseInfo(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6e842f93d0..98d7a19dad 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Bitmapset *live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..fcb8d90f48
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1403 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * TODO: write a longer description of things in this file
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static void extract_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns a Bitmapset of the RT indexes of relations belonging to the
+ * minimum set of partitions which must be scanned to satisfy rel's
+ * baserestrictinfo quals.
+ */
+Bitmapset *
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Bitmapset *result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (!clauses)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+
+ context.partkeys = (Expr **) palloc0(sizeof(Expr *) *
+ context.partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.parttypid = rel->part_scheme->parttypid;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses; context.clauseinfo will be set */
+ generate_partition_clauses(&context, clauses);
+
+ if (!context.clauseinfo->constfalse)
+ {
+ Bitmapset *partindexes = get_partitions_from_clauses(&context);
+
+ /* Add selected partitions' RT indexes to result. */
+ while ((i = bms_first_member(partindexes)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Analyzes clauses to find those that match the partition key and sets
+ * context->clauseinfo
+ */
+void
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* And away we go to do the real work. */
+ extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses described in context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * Populate minimal clauses with the most restrictive
+ * of clauses from context's partclauseinfo.
+ */
+ remove_redundant_clauses(context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have the values we'd need to eliminate
+ * partitions using get_partitions_for_keys, likely because
+ * context->clauseinfo only contained <> clauses and/or OR
+ * clauses, which are handled further below in this function.
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Select partitions by applying OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in context->clauseinfo. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ */
+static void
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ context->clauseinfo = partclauseinfo = makeNode(PartitionClauseInfo);
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ continue;
+ }
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ continue;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering operators like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber &&
+ context->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ continue;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle it if its negator is indeed a part of the
+ * partitioning equality operator.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ continue;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+ /*
+ * Boolean clauses have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ continue;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!equal(leftop, partkey))
+ continue;
+
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!equal(leftop, partkey))
+ continue;
+
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+ }
+}
+
+/*
+ * get_partitions_from_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ /*
+ * All fields except clauseinfo are same as in the parent context,
+ * which will be set by calling extract_partition_clauses().
+ */
+ memcpy(&subcontext, context, sizeof(PartitionPruneContext));
+ extract_partition_clauses(&subcontext, clauses);
+
+ if (!subcontext.clauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!subcontext.clauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(&subcontext);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets a field
+ * in context->clauseinfo to inform the caller that we found such clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(parttypid, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(parttypid, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ if (clauseinfo->ne_clauses)
+ {
+ keys->ne_datums = (Datum *)
+ palloc0(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->parttypid[0], pc->value,
+ &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != parttypid)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ parttypid, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index dcfc1665a8..f3063be6d9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
+
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 5c368321e6..5b5be8fe16 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..0dd6bd3020 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,87 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *parttypid;
+ Oid *partopfamily;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+
+ /* Information about matched clauses */
+ PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Properties found are cached and are indexed by the
+ * partition key index.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses found for the corresponding partition
+ * are inclusive of the stored value or not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /* Datum values from clauses containing <> operator */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +154,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..0ac242aeda 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -190,6 +190,7 @@ typedef enum NodeTag
T_JoinExpr,
T_FromExpr,
T_OnConflictExpr,
+ T_PartitionClauseInfo,
T_IntoClause,
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..54c678bb43 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,37 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionClauseInfo
+ *
+ * Stores clauses which were matched to a partition key. Each matching clause
+ * is stored in the 'keyclauses' list for the partition key index that it was
+ * matched to. Other details are also stored, such as OR clauses and
+ * not-equal (<>) clauses. Nullness properties are also stored.
+ *----------
+ */
+typedef struct PartitionClauseInfo
+{
+ NodeTag type;
+
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ce9975c620..5ee23a5bb5 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -538,6 +538,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -666,6 +668,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..5c0d469600
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Bitmapset *prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern void generate_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..bc9ff38253 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,355 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..b7c5abf378 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,79 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
v25-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v25-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 44471a8e2f8cfafef08cd578cece517c547b5af0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v25 5/5] Add only unpruned partitioned child rels to
partitioned_rels
---
src/backend/optimizer/path/allpaths.c | 69 ++++++++++++++++-------------------
src/backend/optimizer/plan/planner.c | 19 +++++++---
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/relation.h | 8 ++++
4 files changed, 56 insertions(+), 43 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 98d7a19dad..0adcfad958 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,7 +878,10 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
/*
* Initialize to compute size estimates for whole append relation.
@@ -1358,44 +1361,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1413,17 +1411,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ list_copy(childrel->live_partitioned_rels));
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 740de4957d..3b26bab37b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5975,14 +5975,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 5b5be8fe16..ad40ac7f8b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_partitioned_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_partitioned_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ee23a5bb5..6454954e3b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -542,6 +542,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * live_partitioned_rels - RT indexes of unpruned partitions that are
+ * partitioned tables themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -674,6 +676,12 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
On 2018/02/09 2:58, Alvaro Herrera wrote:
Robert Haas wrote:
On Wed, Feb 7, 2018 at 3:42 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:partition.c seems to have two kinds of functions 1. that build and
manage relcache, creates quals from bounds etc. which are metadata
management kind 2. partition bound comparison functions, and other
optimizer related functions. May be we should divide the file that
way. The first category code remains in catalog/ as it is today. The
second catagory functions move to optimizer/.It would be sensible to separate functions that build and manage data
in the relcache from other functions. I think we should consider
moving the existing functions of that type from partition.c to
src/backend/utils/cache/partcache.c.FWIW I've been thinking that perhaps we need some other separation of
code better than statu quo. The current partition.c file includes stuff
for several modules and ISTM all these new patches are making more and
more of a mess. So +1 to the general idea of splitting things up.
Maybe partcache.c is not ambitious enough, but it seems a good first
step.
Agree with the proposed reorganizing and adding a partcache.c, which I
tried to do in the attached patch.
* The new src/backend/utils/cache/partcache.c contains functions that
initialize relcache's partitioning related fields. Various partition
bound comparison and search functions (and then some) that work off of the
cached information are moved. Also, since we cache partition qual,
interface functions RelationGetPartitioQual(Relation) and
get_partition_qual_relid(Oid) are moved too.
* The new src/include/utils/partcache.h contains various struct
definitions that are moved from backend/catalog/partition.c,
include/catalog/partition.h, and include/utils/rel.h. Also, declarations
of interface functions of partcache.c.
Thoughts?
Thanks,
Amit
Attachments:
v1-0001-Reorganize-partitioning-code.patchtext/plain; charset=UTF-8; name=v1-0001-Reorganize-partitioning-code.patchDownload
From 32020e095b13c48ac5ca7c10cdd75512ab1cf781 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 13 Feb 2018 15:59:30 +0900
Subject: [PATCH v1] Reorganize partitioning code
---
src/backend/catalog/partition.c | 3906 ++++++++------------------------
src/backend/executor/execMain.c | 1 -
src/backend/executor/execPartition.c | 1 +
src/backend/optimizer/prep/prepunion.c | 2 +-
src/backend/utils/adt/ruleutils.c | 1 -
src/backend/utils/cache/Makefile | 6 +-
src/backend/utils/cache/partcache.c | 2114 +++++++++++++++++
src/backend/utils/cache/relcache.c | 205 +-
src/include/catalog/partition.h | 41 -
src/include/executor/execPartition.h | 2 +-
src/include/utils/partcache.h | 191 ++
src/include/utils/rel.h | 73 +-
12 files changed, 3301 insertions(+), 3242 deletions(-)
create mode 100644 src/backend/utils/cache/partcache.c
create mode 100644 src/include/utils/partcache.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 31c80c7f1a..b93768f7c8 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -15,11 +15,7 @@
#include "postgres.h"
-#include "access/hash.h"
-#include "access/heapam.h"
#include "access/htup_details.h"
-#include "access/nbtree.h"
-#include "access/sysattr.h"
#include "catalog/dependency.h"
#include "catalog/indexing.h"
#include "catalog/objectaddress.h"
@@ -52,98 +48,9 @@
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
-#include "utils/ruleutils.h"
+#include "utils/snapmgr.h"
#include "utils/syscache.h"
-/*
- * Information about bounds of a partitioned relation
- *
- * A list partition datum that is known to be NULL is never put into the
- * datums array. Instead, it is tracked using the null_index field.
- *
- * In the case of range partitioning, ndatums will typically be far less than
- * 2 * nparts, because a partition's upper bound and the next partition's lower
- * bound are the same in most common cases, and we only store one of them (the
- * upper bound). In case of hash partitioning, ndatums will be same as the
- * number of partitions.
- *
- * For range and list partitioned tables, datums is an array of datum-tuples
- * with key->partnatts datums each. For hash partitioned tables, it is an array
- * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
- * given partition.
- *
- * The datums in datums array are arranged in increasing order as defined by
- * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
- * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
- * respectively. For range and list partitions this simply means that the
- * datums in the datums array are arranged in increasing order as defined by
- * the partition key's operator classes and collations.
- *
- * In the case of list partitioning, the indexes array stores one entry for
- * every datum, which is the index of the partition that accepts a given datum.
- * In case of range partitioning, it stores one entry per distinct range
- * datum, which is the index of the partition for which a given datum
- * is an upper bound. In the case of hash partitioning, the number of the
- * entries in the indexes array is same as the greatest modulus amongst all
- * partitions. For a given partition key datum-tuple, the index of the
- * partition which would accept that datum-tuple would be given by the entry
- * pointed by remainder produced when hash value of the datum-tuple is divided
- * by the greatest modulus.
- */
-
-typedef struct PartitionBoundInfoData
-{
- char strategy; /* hash, list or range? */
- int ndatums; /* Length of the datums following array */
- Datum **datums;
- PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
- * NULL for hash and list partitioned
- * tables */
- int *indexes; /* Partition indexes */
- int null_index; /* Index of the null-accepting partition; -1
- * if there isn't one */
- int default_index; /* Index of the default partition; -1 if there
- * isn't one */
-} PartitionBoundInfoData;
-
-#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
-#define partition_bound_has_default(bi) ((bi)->default_index != -1)
-
-/*
- * When qsort'ing partition bounds after reading from the catalog, each bound
- * is represented with one of the following structs.
- */
-
-/* One bound of a hash partition */
-typedef struct PartitionHashBound
-{
- int modulus;
- int remainder;
- int index;
-} PartitionHashBound;
-
-/* One value coming from some (index'th) list partition */
-typedef struct PartitionListValue
-{
- int index;
- Datum value;
-} PartitionListValue;
-
-/* One bound of a range partition */
-typedef struct PartitionRangeBound
-{
- int index;
- Datum *datums; /* range bound datums */
- PartitionRangeDatumKind *kind; /* the kind of each datum */
- bool lower; /* this is the lower (vs upper) bound */
-} PartitionRangeBound;
-
-static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
-static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
- void *arg);
-static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
- void *arg);
-
static Oid get_partition_operator(PartitionKey key, int col,
StrategyNumber strategy, bool *need_relabel);
static Expr *make_partition_op_expr(PartitionKey key, int keynum,
@@ -159,2948 +66,1204 @@ static List *get_qual_for_list(Relation parent, PartitionBoundSpec *spec);
static List *get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
bool for_default);
static List *get_range_nulltest(PartitionKey key);
-static List *generate_partition_qual(Relation rel);
-
-static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
- List *datums, bool lower);
-static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
- int remainder2);
-static int32 partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
- Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums, int n_tuple_datums);
-
-static int partition_list_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
- Datum value, bool *is_equal);
-static int partition_range_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
- PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
- int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
- int modulus, int remainder);
-
-static int get_partition_bound_num_indexes(PartitionBoundInfo b);
-static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
/*
- * RelationBuildPartitionDesc
- * Form rel's partition descriptor
+ * check_default_allows_bound
*
- * Not flushed from the cache by RelationClearRelation() unless changed because
- * of addition or removal of partition.
+ * This function checks if there exists a row in the default partition that
+ * would properly belong to the new partition being added. If it finds one,
+ * it throws an error.
*/
void
-RelationBuildPartitionDesc(Relation rel)
+check_default_allows_bound(Relation parent, Relation default_rel,
+ PartitionBoundSpec *new_spec)
{
- List *inhoids,
- *partoids;
- Oid *oids = NULL;
- List *boundspecs = NIL;
- ListCell *cell;
- int i,
- nparts;
- PartitionKey key = RelationGetPartitionKey(rel);
- PartitionDesc result;
- MemoryContext oldcxt;
-
- int ndatums = 0;
- int default_index = -1;
-
- /* Hash partitioning specific */
- PartitionHashBound **hbounds = NULL;
-
- /* List partitioning specific */
- PartitionListValue **all_values = NULL;
- int null_index = -1;
+ List *new_part_constraints;
+ List *def_part_constraints;
+ List *all_parts;
+ ListCell *lc;
- /* Range partitioning specific */
- PartitionRangeBound **rbounds = NULL;
+ new_part_constraints = (new_spec->strategy == PARTITION_STRATEGY_LIST)
+ ? get_qual_for_list(parent, new_spec)
+ : get_qual_for_range(parent, new_spec, false);
+ def_part_constraints =
+ get_proposed_default_constraint(new_part_constraints);
/*
- * The following could happen in situations where rel has a pg_class entry
- * but not the pg_partitioned_table entry yet.
+ * If the existing constraints on the default partition imply that it will
+ * not contain any row that would belong to the new partition, we can
+ * avoid scanning the default partition.
*/
- if (key == NULL)
+ if (PartConstraintImpliedByRelConstraint(default_rel, def_part_constraints))
+ {
+ ereport(INFO,
+ (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
+ RelationGetRelationName(default_rel))));
return;
+ }
- /* Get partition oids from pg_inherits */
- inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
+ /*
+ * Scan the default partition and its subpartitions, and check for rows
+ * that do not satisfy the revised partition constraints.
+ */
+ if (default_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ all_parts = find_all_inheritors(RelationGetRelid(default_rel),
+ AccessExclusiveLock, NULL);
+ else
+ all_parts = list_make1_oid(RelationGetRelid(default_rel));
- /* Collect bound spec nodes in a list */
- i = 0;
- partoids = NIL;
- foreach(cell, inhoids)
+ foreach(lc, all_parts)
{
- Oid inhrelid = lfirst_oid(cell);
+ Oid part_relid = lfirst_oid(lc);
+ Relation part_rel;
+ Expr *constr;
+ Expr *partition_constraint;
+ EState *estate;
HeapTuple tuple;
- Datum datum;
- bool isnull;
- Node *boundspec;
-
- tuple = SearchSysCache1(RELOID, inhrelid);
- if (!HeapTupleIsValid(tuple))
- elog(ERROR, "cache lookup failed for relation %u", inhrelid);
+ ExprState *partqualstate = NULL;
+ Snapshot snapshot;
+ TupleDesc tupdesc;
+ ExprContext *econtext;
+ HeapScanDesc scan;
+ MemoryContext oldCxt;
+ TupleTableSlot *tupslot;
- /*
- * It is possible that the pg_class tuple of a partition has not been
- * updated yet to set its relpartbound field. The only case where
- * this happens is when we open the parent relation to check using its
- * partition descriptor that a new partition's bound does not overlap
- * some existing partition.
- */
- if (!((Form_pg_class) GETSTRUCT(tuple))->relispartition)
+ /* Lock already taken above. */
+ if (part_relid != RelationGetRelid(default_rel))
{
- ReleaseSysCache(tuple);
- continue;
- }
+ part_rel = heap_open(part_relid, NoLock);
+
+ /*
+ * If the partition constraints on default partition child imply
+ * that it will not contain any row that would belong to the new
+ * partition, we can avoid scanning the child table.
+ */
+ if (PartConstraintImpliedByRelConstraint(part_rel,
+ def_part_constraints))
+ {
+ ereport(INFO,
+ (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
+ RelationGetRelationName(part_rel))));
- datum = SysCacheGetAttr(RELOID, tuple,
- Anum_pg_class_relpartbound,
- &isnull);
- Assert(!isnull);
- boundspec = (Node *) stringToNode(TextDatumGetCString(datum));
+ heap_close(part_rel, NoLock);
+ continue;
+ }
+ }
+ else
+ part_rel = default_rel;
/*
- * Sanity check: If the PartitionBoundSpec says this is the default
- * partition, its OID should correspond to whatever's stored in
- * pg_partitioned_table.partdefid; if not, the catalog is corrupt.
+ * Only RELKIND_RELATION relations (i.e. leaf partitions) need to be
+ * scanned.
*/
- if (castNode(PartitionBoundSpec, boundspec)->is_default)
+ if (part_rel->rd_rel->relkind != RELKIND_RELATION)
{
- Oid partdefid;
-
- partdefid = get_default_partition_oid(RelationGetRelid(rel));
- if (partdefid != inhrelid)
- elog(ERROR, "expected partdefid %u, but got %u",
- inhrelid, partdefid);
- }
+ if (part_rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ ereport(WARNING,
+ (errcode(ERRCODE_CHECK_VIOLATION),
+ errmsg("skipped scanning foreign table \"%s\" which is a partition of default partition \"%s\"",
+ RelationGetRelationName(part_rel),
+ RelationGetRelationName(default_rel))));
- boundspecs = lappend(boundspecs, boundspec);
- partoids = lappend_oid(partoids, inhrelid);
- ReleaseSysCache(tuple);
- }
+ if (RelationGetRelid(default_rel) != RelationGetRelid(part_rel))
+ heap_close(part_rel, NoLock);
- nparts = list_length(partoids);
+ continue;
+ }
- if (nparts > 0)
- {
- oids = (Oid *) palloc(nparts * sizeof(Oid));
- i = 0;
- foreach(cell, partoids)
- oids[i++] = lfirst_oid(cell);
+ tupdesc = CreateTupleDescCopy(RelationGetDescr(part_rel));
+ constr = linitial(def_part_constraints);
+ partition_constraint = (Expr *)
+ map_partition_varattnos((List *) constr,
+ 1, part_rel, parent, NULL);
+ estate = CreateExecutorState();
- /* Convert from node to the internal representation */
- if (key->strategy == PARTITION_STRATEGY_HASH)
- {
- ndatums = nparts;
- hbounds = (PartitionHashBound **)
- palloc(nparts * sizeof(PartitionHashBound *));
+ /* Build expression execution states for partition check quals */
+ partqualstate = ExecPrepareExpr(partition_constraint, estate);
- i = 0;
- foreach(cell, boundspecs)
- {
- PartitionBoundSpec *spec = castNode(PartitionBoundSpec,
- lfirst(cell));
+ econtext = GetPerTupleExprContext(estate);
+ snapshot = RegisterSnapshot(GetLatestSnapshot());
+ scan = heap_beginscan(part_rel, snapshot, 0, NULL);
+ tupslot = MakeSingleTupleTableSlot(tupdesc);
- if (spec->strategy != PARTITION_STRATEGY_HASH)
- elog(ERROR, "invalid strategy in partition bound spec");
+ /*
+ * Switch to per-tuple memory context and reset it for each tuple
+ * produced, so we don't leak memory.
+ */
+ oldCxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- hbounds[i] = (PartitionHashBound *)
- palloc(sizeof(PartitionHashBound));
+ while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+ {
+ ExecStoreTuple(tuple, tupslot, InvalidBuffer, false);
+ econtext->ecxt_scantuple = tupslot;
- hbounds[i]->modulus = spec->modulus;
- hbounds[i]->remainder = spec->remainder;
- hbounds[i]->index = i;
- i++;
- }
+ if (!ExecCheck(partqualstate, econtext))
+ ereport(ERROR,
+ (errcode(ERRCODE_CHECK_VIOLATION),
+ errmsg("updated partition constraint for default partition \"%s\" would be violated by some row",
+ RelationGetRelationName(default_rel))));
- /* Sort all the bounds in ascending order */
- qsort(hbounds, nparts, sizeof(PartitionHashBound *),
- qsort_partition_hbound_cmp);
+ ResetExprContext(econtext);
+ CHECK_FOR_INTERRUPTS();
}
- else if (key->strategy == PARTITION_STRATEGY_LIST)
- {
- List *non_null_values = NIL;
- /*
- * Create a unified list of non-null values across all partitions.
- */
- i = 0;
- null_index = -1;
- foreach(cell, boundspecs)
- {
- PartitionBoundSpec *spec = castNode(PartitionBoundSpec,
- lfirst(cell));
- ListCell *c;
+ MemoryContextSwitchTo(oldCxt);
+ heap_endscan(scan);
+ UnregisterSnapshot(snapshot);
+ ExecDropSingleTupleTableSlot(tupslot);
+ FreeExecutorState(estate);
- if (spec->strategy != PARTITION_STRATEGY_LIST)
- elog(ERROR, "invalid strategy in partition bound spec");
+ if (RelationGetRelid(default_rel) != RelationGetRelid(part_rel))
+ heap_close(part_rel, NoLock); /* keep the lock until commit */
+ }
+}
- /*
- * Note the index of the partition bound spec for the default
- * partition. There's no datum to add to the list of non-null
- * datums for this partition.
- */
- if (spec->is_default)
- {
- default_index = i;
- i++;
- continue;
- }
+/*
+ * get_partition_parent
+ *
+ * Returns inheritance parent of a partition by scanning pg_inherits
+ *
+ * Note: Because this function assumes that the relation whose OID is passed
+ * as an argument will have precisely one parent, it should only be called
+ * when it is known that the relation is a partition.
+ */
+Oid
+get_partition_parent(Oid relid)
+{
+ Form_pg_inherits form;
+ Relation catalogRelation;
+ SysScanDesc scan;
+ ScanKeyData key[2];
+ HeapTuple tuple;
+ Oid result;
- foreach(c, spec->listdatums)
- {
- Const *val = castNode(Const, lfirst(c));
- PartitionListValue *list_value = NULL;
+ catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
- if (!val->constisnull)
- {
- list_value = (PartitionListValue *)
- palloc0(sizeof(PartitionListValue));
- list_value->index = i;
- list_value->value = val->constvalue;
- }
- else
- {
- /*
- * Never put a null into the values array, flag
- * instead for the code further down below where we
- * construct the actual relcache struct.
- */
- if (null_index != -1)
- elog(ERROR, "found null more than once");
- null_index = i;
- }
+ ScanKeyInit(&key[0],
+ Anum_pg_inherits_inhrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(relid));
+ ScanKeyInit(&key[1],
+ Anum_pg_inherits_inhseqno,
+ BTEqualStrategyNumber, F_INT4EQ,
+ Int32GetDatum(1));
- if (list_value)
- non_null_values = lappend(non_null_values,
- list_value);
- }
+ scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
+ NULL, 2, key);
- i++;
- }
+ tuple = systable_getnext(scan);
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "could not find tuple for parent of relation %u", relid);
- ndatums = list_length(non_null_values);
+ form = (Form_pg_inherits) GETSTRUCT(tuple);
+ result = form->inhparent;
- /*
- * Collect all list values in one array. Alongside the value, we
- * also save the index of partition the value comes from.
- */
- all_values = (PartitionListValue **) palloc(ndatums *
- sizeof(PartitionListValue *));
- i = 0;
- foreach(cell, non_null_values)
- {
- PartitionListValue *src = lfirst(cell);
+ systable_endscan(scan);
+ heap_close(catalogRelation, AccessShareLock);
- all_values[i] = (PartitionListValue *)
- palloc(sizeof(PartitionListValue));
- all_values[i]->value = src->value;
- all_values[i]->index = src->index;
- i++;
- }
+ return result;
+}
- qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
- qsort_partition_list_value_cmp, (void *) key);
- }
- else if (key->strategy == PARTITION_STRATEGY_RANGE)
- {
- int k;
- PartitionRangeBound **all_bounds,
- *prev;
+/*
+ * get_qual_from_partbound
+ * Given a parser node for partition bound, return the list of executable
+ * expressions as partition constraint
+ */
+List *
+get_qual_from_partbound(Relation rel, Relation parent,
+ PartitionBoundSpec *spec)
+{
+ PartitionKey key = RelationGetPartitionKey(parent);
+ List *my_qual = NIL;
- all_bounds = (PartitionRangeBound **) palloc0(2 * nparts *
- sizeof(PartitionRangeBound *));
+ Assert(key != NULL);
- /*
- * Create a unified list of range bounds across all the
- * partitions.
- */
- i = ndatums = 0;
- foreach(cell, boundspecs)
- {
- PartitionBoundSpec *spec = castNode(PartitionBoundSpec,
- lfirst(cell));
- PartitionRangeBound *lower,
- *upper;
+ switch (key->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ Assert(spec->strategy == PARTITION_STRATEGY_HASH);
+ my_qual = get_qual_for_hash(parent, spec);
+ break;
- if (spec->strategy != PARTITION_STRATEGY_RANGE)
- elog(ERROR, "invalid strategy in partition bound spec");
+ case PARTITION_STRATEGY_LIST:
+ Assert(spec->strategy == PARTITION_STRATEGY_LIST);
+ my_qual = get_qual_for_list(parent, spec);
+ break;
- /*
- * Note the index of the partition bound spec for the default
- * partition. There's no datum to add to the allbounds array
- * for this partition.
- */
- if (spec->is_default)
- {
- default_index = i++;
- continue;
- }
+ case PARTITION_STRATEGY_RANGE:
+ Assert(spec->strategy == PARTITION_STRATEGY_RANGE);
+ my_qual = get_qual_for_range(parent, spec, false);
+ break;
- lower = make_one_range_bound(key, i, spec->lowerdatums,
- true);
- upper = make_one_range_bound(key, i, spec->upperdatums,
- false);
- all_bounds[ndatums++] = lower;
- all_bounds[ndatums++] = upper;
- i++;
- }
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
- Assert(ndatums == nparts * 2 ||
- (default_index != -1 && ndatums == (nparts - 1) * 2));
-
- /* Sort all the bounds in ascending order */
- qsort_arg(all_bounds, ndatums,
- sizeof(PartitionRangeBound *),
- qsort_partition_rbound_cmp,
- (void *) key);
-
- /* Save distinct bounds from all_bounds into rbounds. */
- rbounds = (PartitionRangeBound **)
- palloc(ndatums * sizeof(PartitionRangeBound *));
- k = 0;
- prev = NULL;
- for (i = 0; i < ndatums; i++)
- {
- PartitionRangeBound *cur = all_bounds[i];
- bool is_distinct = false;
- int j;
+ return my_qual;
+}
- /* Is the current bound distinct from the previous one? */
- for (j = 0; j < key->partnatts; j++)
- {
- Datum cmpval;
+/*
+ * map_partition_varattnos - maps varattno of any Vars in expr from the
+ * attno's of 'from_rel' to the attno's of 'to_rel' partition, each of which
+ * may be either a leaf partition or a partitioned table, but both of which
+ * must be from the same partitioning hierarchy.
+ *
+ * Even though all of the same column names must be present in all relations
+ * in the hierarchy, and they must also have the same types, the attnos may
+ * be different.
+ *
+ * If found_whole_row is not NULL, *found_whole_row returns whether a
+ * whole-row variable was found in the input expression.
+ *
+ * Note: this will work on any node tree, so really the argument and result
+ * should be declared "Node *". But a substantial majority of the callers
+ * are working on Lists, so it's less messy to do the casts internally.
+ */
+List *
+map_partition_varattnos(List *expr, int fromrel_varno,
+ Relation to_rel, Relation from_rel,
+ bool *found_whole_row)
+{
+ bool my_found_whole_row = false;
- if (prev == NULL || cur->kind[j] != prev->kind[j])
- {
- is_distinct = true;
- break;
- }
+ if (expr != NIL)
+ {
+ AttrNumber *part_attnos;
- /*
- * If the bounds are both MINVALUE or MAXVALUE, stop now
- * and treat them as equal, since any values after this
- * point must be ignored.
- */
- if (cur->kind[j] != PARTITION_RANGE_DATUM_VALUE)
- break;
-
- cmpval = FunctionCall2Coll(&key->partsupfunc[j],
- key->partcollation[j],
- cur->datums[j],
- prev->datums[j]);
- if (DatumGetInt32(cmpval) != 0)
- {
- is_distinct = true;
- break;
- }
- }
+ part_attnos = convert_tuples_by_name_map(RelationGetDescr(to_rel),
+ RelationGetDescr(from_rel),
+ gettext_noop("could not convert row type"));
+ expr = (List *) map_variable_attnos((Node *) expr,
+ fromrel_varno, 0,
+ part_attnos,
+ RelationGetDescr(from_rel)->natts,
+ RelationGetForm(to_rel)->reltype,
+ &my_found_whole_row);
+ }
- /*
- * Only if the bound is distinct save it into a temporary
- * array i.e. rbounds which is later copied into boundinfo
- * datums array.
- */
- if (is_distinct)
- rbounds[k++] = all_bounds[i];
+ if (found_whole_row)
+ *found_whole_row = my_found_whole_row;
- prev = cur;
- }
+ return expr;
+}
- /* Update ndatums to hold the count of distinct datums. */
- ndatums = k;
- }
- else
- elog(ERROR, "unexpected partition strategy: %d",
- (int) key->strategy);
- }
+/* Module-local functions */
+
+/*
+ * get_partition_operator
+ *
+ * Return oid of the operator of given strategy for a given partition key
+ * column.
+ */
+static Oid
+get_partition_operator(PartitionKey key, int col, StrategyNumber strategy,
+ bool *need_relabel)
+{
+ Oid operoid;
- /* Now build the actual relcache partition descriptor */
- rel->rd_pdcxt = AllocSetContextCreateExtended(CacheMemoryContext,
- RelationGetRelationName(rel),
- MEMCONTEXT_COPY_NAME,
- ALLOCSET_DEFAULT_SIZES);
- oldcxt = MemoryContextSwitchTo(rel->rd_pdcxt);
+ /*
+ * First check if there exists an operator of the given strategy, with
+ * this column's type as both its lefttype and righttype, in the
+ * partitioning operator family specified for the column.
+ */
+ operoid = get_opfamily_member(key->partopfamily[col],
+ key->parttypid[col],
+ key->parttypid[col],
+ strategy);
- result = (PartitionDescData *) palloc0(sizeof(PartitionDescData));
- result->nparts = nparts;
- if (nparts > 0)
+ /*
+ * If one doesn't exist, we must resort to using an operator in the same
+ * operator family but with the operator class declared input type. It is
+ * OK to do so, because the column's type is known to be binary-coercible
+ * with the operator class input type (otherwise, the operator class in
+ * question would not have been accepted as the partitioning operator
+ * class). We must however inform the caller to wrap the non-Const
+ * expression with a RelabelType node to denote the implicit coercion. It
+ * ensures that the resulting expression structurally matches similarly
+ * processed expressions within the optimizer.
+ */
+ if (!OidIsValid(operoid))
{
- PartitionBoundInfo boundinfo;
- int *mapping;
- int next_index = 0;
-
- result->oids = (Oid *) palloc0(nparts * sizeof(Oid));
-
- boundinfo = (PartitionBoundInfoData *)
- palloc0(sizeof(PartitionBoundInfoData));
- boundinfo->strategy = key->strategy;
- boundinfo->default_index = -1;
- boundinfo->ndatums = ndatums;
- boundinfo->null_index = -1;
- boundinfo->datums = (Datum **) palloc0(ndatums * sizeof(Datum *));
-
- /* Initialize mapping array with invalid values */
- mapping = (int *) palloc(sizeof(int) * nparts);
- for (i = 0; i < nparts; i++)
- mapping[i] = -1;
+ operoid = get_opfamily_member(key->partopfamily[col],
+ key->partopcintype[col],
+ key->partopcintype[col],
+ strategy);
+ if (!OidIsValid(operoid))
+ elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
+ strategy, key->partopcintype[col], key->partopcintype[col],
+ key->partopfamily[col]);
+ *need_relabel = true;
+ }
+ else
+ *need_relabel = false;
- switch (key->strategy)
- {
- case PARTITION_STRATEGY_HASH:
- {
- /* Modulus are stored in ascending order */
- int greatest_modulus = hbounds[ndatums - 1]->modulus;
+ return operoid;
+}
- boundinfo->indexes = (int *) palloc(greatest_modulus *
- sizeof(int));
+/*
+ * make_partition_op_expr
+ * Returns an Expr for the given partition key column with arg1 and
+ * arg2 as its leftop and rightop, respectively
+ */
+static Expr *
+make_partition_op_expr(PartitionKey key, int keynum,
+ uint16 strategy, Expr *arg1, Expr *arg2)
+{
+ Oid operoid;
+ bool need_relabel = false;
+ Expr *result = NULL;
- for (i = 0; i < greatest_modulus; i++)
- boundinfo->indexes[i] = -1;
+ /* Get the correct btree operator for this partitioning column */
+ operoid = get_partition_operator(key, keynum, strategy, &need_relabel);
- for (i = 0; i < nparts; i++)
- {
- int modulus = hbounds[i]->modulus;
- int remainder = hbounds[i]->remainder;
-
- boundinfo->datums[i] = (Datum *) palloc(2 *
- sizeof(Datum));
- boundinfo->datums[i][0] = Int32GetDatum(modulus);
- boundinfo->datums[i][1] = Int32GetDatum(remainder);
-
- while (remainder < greatest_modulus)
- {
- /* overlap? */
- Assert(boundinfo->indexes[remainder] == -1);
- boundinfo->indexes[remainder] = i;
- remainder += modulus;
- }
-
- mapping[hbounds[i]->index] = i;
- pfree(hbounds[i]);
- }
- pfree(hbounds);
- break;
- }
+ /*
+ * Chosen operator may be such that the non-Const operand needs to be
+ * coerced, so apply the same; see the comment in
+ * get_partition_operator().
+ */
+ if (!IsA(arg1, Const) &&
+ (need_relabel ||
+ key->partcollation[keynum] != key->parttypcoll[keynum]))
+ arg1 = (Expr *) makeRelabelType(arg1,
+ key->partopcintype[keynum],
+ -1,
+ key->partcollation[keynum],
+ COERCE_EXPLICIT_CAST);
- case PARTITION_STRATEGY_LIST:
- {
- boundinfo->indexes = (int *) palloc(ndatums * sizeof(int));
-
- /*
- * Copy values. Indexes of individual values are mapped
- * to canonical values so that they match for any two list
- * partitioned tables with same number of partitions and
- * same lists per partition. One way to canonicalize is
- * to assign the index in all_values[] of the smallest
- * value of each partition, as the index of all of the
- * partition's values.
- */
- for (i = 0; i < ndatums; i++)
- {
- boundinfo->datums[i] = (Datum *) palloc(sizeof(Datum));
- boundinfo->datums[i][0] = datumCopy(all_values[i]->value,
- key->parttypbyval[0],
- key->parttyplen[0]);
+ /* Generate the actual expression */
+ switch (key->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ {
+ List *elems = (List *) arg2;
+ int nelems = list_length(elems);
- /* If the old index has no mapping, assign one */
- if (mapping[all_values[i]->index] == -1)
- mapping[all_values[i]->index] = next_index++;
+ Assert(nelems >= 1);
+ Assert(keynum == 0);
- boundinfo->indexes[i] = mapping[all_values[i]->index];
- }
+ if (nelems > 1 &&
+ !type_is_array(key->parttypid[keynum]))
+ {
+ ArrayExpr *arrexpr;
+ ScalarArrayOpExpr *saopexpr;
- /*
- * If null-accepting partition has no mapped index yet,
- * assign one. This could happen if such partition
- * accepts only null and hence not covered in the above
- * loop which only handled non-null values.
- */
- if (null_index != -1)
- {
- Assert(null_index >= 0);
- if (mapping[null_index] == -1)
- mapping[null_index] = next_index++;
- boundinfo->null_index = mapping[null_index];
- }
+ /* Construct an ArrayExpr for the right-hand inputs */
+ arrexpr = makeNode(ArrayExpr);
+ arrexpr->array_typeid =
+ get_array_type(key->parttypid[keynum]);
+ arrexpr->array_collid = key->parttypcoll[keynum];
+ arrexpr->element_typeid = key->parttypid[keynum];
+ arrexpr->elements = elems;
+ arrexpr->multidims = false;
+ arrexpr->location = -1;
- /* Assign mapped index for the default partition. */
- if (default_index != -1)
- {
- /*
- * The default partition accepts any value not
- * specified in the lists of other partitions, hence
- * it should not get mapped index while assigning
- * those for non-null datums.
- */
- Assert(default_index >= 0 &&
- mapping[default_index] == -1);
- mapping[default_index] = next_index++;
- boundinfo->default_index = mapping[default_index];
- }
+ /* Build leftop = ANY (rightop) */
+ saopexpr = makeNode(ScalarArrayOpExpr);
+ saopexpr->opno = operoid;
+ saopexpr->opfuncid = get_opcode(operoid);
+ saopexpr->useOr = true;
+ saopexpr->inputcollid = key->partcollation[keynum];
+ saopexpr->args = list_make2(arg1, arrexpr);
+ saopexpr->location = -1;
- /* All partition must now have a valid mapping */
- Assert(next_index == nparts);
- break;
+ result = (Expr *) saopexpr;
}
-
- case PARTITION_STRATEGY_RANGE:
+ else
{
- boundinfo->kind = (PartitionRangeDatumKind **)
- palloc(ndatums *
- sizeof(PartitionRangeDatumKind *));
- boundinfo->indexes = (int *) palloc((ndatums + 1) *
- sizeof(int));
+ List *elemops = NIL;
+ ListCell *lc;
- for (i = 0; i < ndatums; i++)
+ foreach (lc, elems)
{
- int j;
-
- boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
- sizeof(Datum));
- boundinfo->kind[i] = (PartitionRangeDatumKind *)
- palloc(key->partnatts *
- sizeof(PartitionRangeDatumKind));
- for (j = 0; j < key->partnatts; j++)
- {
- if (rbounds[i]->kind[j] == PARTITION_RANGE_DATUM_VALUE)
- boundinfo->datums[i][j] =
- datumCopy(rbounds[i]->datums[j],
- key->parttypbyval[j],
- key->parttyplen[j]);
- boundinfo->kind[i][j] = rbounds[i]->kind[j];
- }
-
- /*
- * There is no mapping for invalid indexes.
- *
- * Any lower bounds in the rbounds array have invalid
- * indexes assigned, because the values between the
- * previous bound (if there is one) and this (lower)
- * bound are not part of the range of any existing
- * partition.
- */
- if (rbounds[i]->lower)
- boundinfo->indexes[i] = -1;
- else
- {
- int orig_index = rbounds[i]->index;
-
- /* If the old index has no mapping, assign one */
- if (mapping[orig_index] == -1)
- mapping[orig_index] = next_index++;
-
- boundinfo->indexes[i] = mapping[orig_index];
- }
- }
+ Expr *elem = lfirst(lc),
+ *elemop;
- /* Assign mapped index for the default partition. */
- if (default_index != -1)
- {
- Assert(default_index >= 0 && mapping[default_index] == -1);
- mapping[default_index] = next_index++;
- boundinfo->default_index = mapping[default_index];
+ elemop = make_opclause(operoid,
+ BOOLOID,
+ false,
+ arg1, elem,
+ InvalidOid,
+ key->partcollation[keynum]);
+ elemops = lappend(elemops, elemop);
}
- boundinfo->indexes[i] = -1;
- break;
- }
- default:
- elog(ERROR, "unexpected partition strategy: %d",
- (int) key->strategy);
- }
+ result = nelems > 1 ? makeBoolExpr(OR_EXPR, elemops, -1) : linitial(elemops);
+ }
+ break;
+ }
- result->boundinfo = boundinfo;
+ case PARTITION_STRATEGY_RANGE:
+ result = make_opclause(operoid,
+ BOOLOID,
+ false,
+ arg1, arg2,
+ InvalidOid,
+ key->partcollation[keynum]);
+ break;
- /*
- * Now assign OIDs from the original array into mapped indexes of the
- * result array. Order of OIDs in the former is defined by the
- * catalog scan that retrieved them, whereas that in the latter is
- * defined by canonicalized representation of the partition bounds.
- */
- for (i = 0; i < nparts; i++)
- result->oids[mapping[i]] = oids[i];
- pfree(mapping);
+ default:
+ elog(ERROR, "invalid partitioning strategy");
+ break;
}
- MemoryContextSwitchTo(oldcxt);
- rel->rd_partdesc = result;
+ return result;
}
/*
- * Are two partition bound collections logically equal?
+ * get_qual_for_hash
+ *
+ * Given a list of partition columns, modulus and remainder corresponding to a
+ * partition, this function returns CHECK constraint expression Node for that
+ * partition.
*
- * Used in the keep logic of relcache.c (ie, in RelationClearRelation()).
- * This is also useful when b1 and b2 are bound collections of two separate
- * relations, respectively, because PartitionBoundInfo is a canonical
- * representation of partition bounds.
+ * The partition constraint for a hash partition is always a call to the
+ * built-in function satisfies_hash_partition(). The first two arguments are
+ * the modulus and remainder for the partition; the remaining arguments are the
+ * values to be hashed.
*/
-bool
-partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
- PartitionBoundInfo b1, PartitionBoundInfo b2)
+static List *
+get_qual_for_hash(Relation parent, PartitionBoundSpec *spec)
{
+ PartitionKey key = RelationGetPartitionKey(parent);
+ FuncExpr *fexpr;
+ Node *relidConst;
+ Node *modulusConst;
+ Node *remainderConst;
+ List *args;
+ ListCell *partexprs_item;
int i;
- if (b1->strategy != b2->strategy)
- return false;
+ /* Fixed arguments. */
+ relidConst = (Node *) makeConst(OIDOID,
+ -1,
+ InvalidOid,
+ sizeof(Oid),
+ ObjectIdGetDatum(RelationGetRelid(parent)),
+ false,
+ true);
- if (b1->ndatums != b2->ndatums)
- return false;
+ modulusConst = (Node *) makeConst(INT4OID,
+ -1,
+ InvalidOid,
+ sizeof(int32),
+ Int32GetDatum(spec->modulus),
+ false,
+ true);
- if (b1->null_index != b2->null_index)
- return false;
+ remainderConst = (Node *) makeConst(INT4OID,
+ -1,
+ InvalidOid,
+ sizeof(int32),
+ Int32GetDatum(spec->remainder),
+ false,
+ true);
- if (b1->default_index != b2->default_index)
- return false;
+ args = list_make3(relidConst, modulusConst, remainderConst);
+ partexprs_item = list_head(key->partexprs);
- if (b1->strategy == PARTITION_STRATEGY_HASH)
- {
- int greatest_modulus = get_greatest_modulus(b1);
-
- /*
- * If two hash partitioned tables have different greatest moduli,
- * their partition schemes don't match.
- */
- if (greatest_modulus != get_greatest_modulus(b2))
- return false;
-
- /*
- * We arrange the partitions in the ascending order of their modulus
- * and remainders. Also every modulus is factor of next larger
- * modulus. Therefore we can safely store index of a given partition
- * in indexes array at remainder of that partition. Also entries at
- * (remainder + N * modulus) positions in indexes array are all same
- * for (modulus, remainder) specification for any partition. Thus
- * datums array from both the given bounds are same, if and only if
- * their indexes array will be same. So, it suffices to compare
- * indexes array.
- */
- for (i = 0; i < greatest_modulus; i++)
- if (b1->indexes[i] != b2->indexes[i])
- return false;
-
-#ifdef USE_ASSERT_CHECKING
-
- /*
- * Nonetheless make sure that the bounds are indeed same when the
- * indexes match. Hash partition bound stores modulus and remainder
- * at b1->datums[i][0] and b1->datums[i][1] position respectively.
- */
- for (i = 0; i < b1->ndatums; i++)
- Assert((b1->datums[i][0] == b2->datums[i][0] &&
- b1->datums[i][1] == b2->datums[i][1]));
-#endif
- }
- else
+ /* Add an argument for each key column. */
+ for (i = 0; i < key->partnatts; i++)
{
- for (i = 0; i < b1->ndatums; i++)
- {
- int j;
-
- for (j = 0; j < partnatts; j++)
- {
- /* For range partitions, the bounds might not be finite. */
- if (b1->kind != NULL)
- {
- /* The different kinds of bound all differ from each other */
- if (b1->kind[i][j] != b2->kind[i][j])
- return false;
-
- /*
- * Non-finite bounds are equal without further
- * examination.
- */
- if (b1->kind[i][j] != PARTITION_RANGE_DATUM_VALUE)
- continue;
- }
-
- /*
- * Compare the actual values. Note that it would be both
- * incorrect and unsafe to invoke the comparison operator
- * derived from the partitioning specification here. It would
- * be incorrect because we want the relcache entry to be
- * updated for ANY change to the partition bounds, not just
- * those that the partitioning operator thinks are
- * significant. It would be unsafe because we might reach
- * this code in the context of an aborted transaction, and an
- * arbitrary partitioning operator might not be safe in that
- * context. datumIsEqual() should be simple enough to be
- * safe.
- */
- if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
- parttypbyval[j], parttyplen[j]))
- return false;
- }
+ Node *keyCol;
- if (b1->indexes[i] != b2->indexes[i])
- return false;
+ /* Left operand */
+ if (key->partattrs[i] != 0)
+ {
+ keyCol = (Node *) makeVar(1,
+ key->partattrs[i],
+ key->parttypid[i],
+ key->parttypmod[i],
+ key->parttypcoll[i],
+ 0);
+ }
+ else
+ {
+ keyCol = (Node *) copyObject(lfirst(partexprs_item));
+ partexprs_item = lnext(partexprs_item);
}
- /* There are ndatums+1 indexes in case of range partitions */
- if (b1->strategy == PARTITION_STRATEGY_RANGE &&
- b1->indexes[i] != b2->indexes[i])
- return false;
+ args = lappend(args, keyCol);
}
- return true;
+
+ fexpr = makeFuncExpr(F_SATISFIES_HASH_PARTITION,
+ BOOLOID,
+ args,
+ InvalidOid,
+ InvalidOid,
+ COERCE_EXPLICIT_CALL);
+
+ return list_make1(fexpr);
}
/*
- * Return a copy of given PartitionBoundInfo structure. The data types of bounds
- * are described by given partition key specification.
+ * get_qual_for_list
+ *
+ * Returns an implicit-AND list of expressions to use as a list partition's
+ * constraint, given the partition key and bound structures.
+ *
+ * The function returns NIL for a default partition when it's the only
+ * partition since in that case there is no constraint.
*/
-extern PartitionBoundInfo
-partition_bounds_copy(PartitionBoundInfo src,
- PartitionKey key)
+static List *
+get_qual_for_list(Relation parent, PartitionBoundSpec *spec)
{
- PartitionBoundInfo dest;
- int i;
- int ndatums;
- int partnatts;
- int num_indexes;
+ PartitionKey key = RelationGetPartitionKey(parent);
+ List *result;
+ Expr *keyCol;
+ Expr *opexpr;
+ NullTest *nulltest;
+ ListCell *cell;
+ List *elems = NIL;
+ bool list_has_null = false;
+
+ /*
+ * Only single-column list partitioning is supported, so we are worried
+ * only about the partition key with index 0.
+ */
+ Assert(key->partnatts == 1);
- dest = (PartitionBoundInfo) palloc(sizeof(PartitionBoundInfoData));
+ /* Construct Var or expression representing the partition column */
+ if (key->partattrs[0] != 0)
+ keyCol = (Expr *) makeVar(1,
+ key->partattrs[0],
+ key->parttypid[0],
+ key->parttypmod[0],
+ key->parttypcoll[0],
+ 0);
+ else
+ keyCol = (Expr *) copyObject(linitial(key->partexprs));
- dest->strategy = src->strategy;
- ndatums = dest->ndatums = src->ndatums;
- partnatts = key->partnatts;
+ /*
+ * For default list partition, collect datums for all the partitions. The
+ * default partition constraint should check that the partition key is
+ * equal to none of those.
+ */
+ if (spec->is_default)
+ {
+ int i;
+ int ndatums = 0;
+ PartitionDesc pdesc = RelationGetPartitionDesc(parent);
+ PartitionBoundInfo boundinfo = pdesc->boundinfo;
- num_indexes = get_partition_bound_num_indexes(src);
+ if (boundinfo)
+ {
+ ndatums = boundinfo->ndatums;
- /* List partitioned tables have only a single partition key. */
- Assert(key->strategy != PARTITION_STRATEGY_LIST || partnatts == 1);
+ if (partition_bound_accepts_nulls(boundinfo))
+ list_has_null = true;
+ }
- dest->datums = (Datum **) palloc(sizeof(Datum *) * ndatums);
+ /*
+ * If default is the only partition, there need not be any partition
+ * constraint on it.
+ */
+ if (ndatums == 0 && !list_has_null)
+ return NIL;
- if (src->kind != NULL)
- {
- dest->kind = (PartitionRangeDatumKind **) palloc(ndatums *
- sizeof(PartitionRangeDatumKind *));
for (i = 0; i < ndatums; i++)
{
- dest->kind[i] = (PartitionRangeDatumKind *) palloc(partnatts *
- sizeof(PartitionRangeDatumKind));
+ Const *val;
+
+ /*
+ * Construct Const from known-not-null datum. We must be careful
+ * to copy the value, because our result has to be able to outlive
+ * the relcache entry we're copying from.
+ */
+ val = makeConst(key->parttypid[0],
+ key->parttypmod[0],
+ key->parttypcoll[0],
+ key->parttyplen[0],
+ datumCopy(*boundinfo->datums[i],
+ key->parttypbyval[0],
+ key->parttyplen[0]),
+ false, /* isnull */
+ key->parttypbyval[0]);
- memcpy(dest->kind[i], src->kind[i],
- sizeof(PartitionRangeDatumKind) * key->partnatts);
+ elems = lappend(elems, val);
}
}
else
- dest->kind = NULL;
+ {
+ /*
+ * Create list of Consts for the allowed values, excluding any nulls.
+ */
+ foreach(cell, spec->listdatums)
+ {
+ Const *val = castNode(Const, lfirst(cell));
+
+ if (val->constisnull)
+ list_has_null = true;
+ else
+ elems = lappend(elems, copyObject(val));
+ }
+ }
- for (i = 0; i < ndatums; i++)
+ if (elems)
{
- int j;
+ /*
+ * Generate the operator expression from the non-null partition
+ * values.
+ */
+ opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber,
+ keyCol, (Expr *) elems);
+ }
+ else
+ {
+ /*
+ * If there are no partition values, we don't need an operator
+ * expression.
+ */
+ opexpr = NULL;
+ }
+ if (!list_has_null)
+ {
/*
- * For a corresponding to hash partition, datums array will have two
- * elements - modulus and remainder.
+ * Gin up a "col IS NOT NULL" test that will be AND'd with the main
+ * expression. This might seem redundant, but the partition routing
+ * machinery needs it.
*/
- bool hash_part = (key->strategy == PARTITION_STRATEGY_HASH);
- int natts = hash_part ? 2 : partnatts;
+ nulltest = makeNode(NullTest);
+ nulltest->arg = keyCol;
+ nulltest->nulltesttype = IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
- dest->datums[i] = (Datum *) palloc(sizeof(Datum) * natts);
+ result = opexpr ? list_make2(nulltest, opexpr) : list_make1(nulltest);
+ }
+ else
+ {
+ /*
+ * Gin up a "col IS NULL" test that will be OR'd with the main
+ * expression.
+ */
+ nulltest = makeNode(NullTest);
+ nulltest->arg = keyCol;
+ nulltest->nulltesttype = IS_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
- for (j = 0; j < natts; j++)
+ if (opexpr)
{
- bool byval;
- int typlen;
-
- if (hash_part)
- {
- typlen = sizeof(int32); /* Always int4 */
- byval = true; /* int4 is pass-by-value */
- }
- else
- {
- byval = key->parttypbyval[j];
- typlen = key->parttyplen[j];
- }
+ Expr *or;
- if (dest->kind == NULL ||
- dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE)
- dest->datums[i][j] = datumCopy(src->datums[i][j],
- byval, typlen);
+ or = makeBoolExpr(OR_EXPR, list_make2(nulltest, opexpr), -1);
+ result = list_make1(or);
}
+ else
+ result = list_make1(nulltest);
}
- dest->indexes = (int *) palloc(sizeof(int) * num_indexes);
- memcpy(dest->indexes, src->indexes, sizeof(int) * num_indexes);
-
- dest->null_index = src->null_index;
- dest->default_index = src->default_index;
+ /*
+ * Note that, in general, applying NOT to a constraint expression doesn't
+ * necessarily invert the set of rows it accepts, because NOT (NULL) is
+ * NULL. However, the partition constraints we construct here never
+ * evaluate to NULL, so applying NOT works as intended.
+ */
+ if (spec->is_default)
+ {
+ result = list_make1(make_ands_explicit(result));
+ result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
+ }
- return dest;
+ return result;
}
/*
- * check_new_partition_bound
+ * get_range_key_properties
+ * Returns range partition key information for a given column
+ *
+ * This is a subroutine for get_qual_for_range, and its API is pretty
+ * specialized to that caller.
+ *
+ * Constructs an Expr for the key column (returned in *keyCol) and Consts
+ * for the lower and upper range limits (returned in *lower_val and
+ * *upper_val). For MINVALUE/MAXVALUE limits, NULL is returned instead of
+ * a Const. All of these structures are freshly palloc'd.
*
- * Checks if the new partition's bound overlaps any of the existing partitions
- * of parent. Also performs additional checks as necessary per strategy.
+ * *partexprs_item points to the cell containing the next expression in
+ * the key->partexprs list, or NULL. It may be advanced upon return.
*/
-void
-check_new_partition_bound(char *relname, Relation parent,
- PartitionBoundSpec *spec)
+static void
+get_range_key_properties(PartitionKey key, int keynum,
+ PartitionRangeDatum *ldatum,
+ PartitionRangeDatum *udatum,
+ ListCell **partexprs_item,
+ Expr **keyCol,
+ Const **lower_val, Const **upper_val)
{
- PartitionKey key = RelationGetPartitionKey(parent);
- PartitionDesc partdesc = RelationGetPartitionDesc(parent);
- PartitionBoundInfo boundinfo = partdesc->boundinfo;
- ParseState *pstate = make_parsestate(NULL);
- int with = -1;
- bool overlap = false;
-
- if (spec->is_default)
+ /* Get partition key expression for this column */
+ if (key->partattrs[keynum] != 0)
{
- if (boundinfo == NULL || !partition_bound_has_default(boundinfo))
- return;
-
- /* Default partition already exists, error out. */
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("partition \"%s\" conflicts with existing default partition \"%s\"",
- relname, get_rel_name(partdesc->oids[boundinfo->default_index])),
- parser_errposition(pstate, spec->location)));
+ *keyCol = (Expr *) makeVar(1,
+ key->partattrs[keynum],
+ key->parttypid[keynum],
+ key->parttypmod[keynum],
+ key->parttypcoll[keynum],
+ 0);
}
-
- switch (key->strategy)
+ else
{
- case PARTITION_STRATEGY_HASH:
- {
- Assert(spec->strategy == PARTITION_STRATEGY_HASH);
- Assert(spec->remainder >= 0 && spec->remainder < spec->modulus);
+ if (*partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+ *keyCol = copyObject(lfirst(*partexprs_item));
+ *partexprs_item = lnext(*partexprs_item);
+ }
- if (partdesc->nparts > 0)
- {
- PartitionBoundInfo boundinfo = partdesc->boundinfo;
- Datum **datums = boundinfo->datums;
- int ndatums = boundinfo->ndatums;
- int greatest_modulus;
- int remainder;
- int offset;
- bool valid_modulus = true;
- int prev_modulus, /* Previous largest modulus */
- next_modulus; /* Next largest modulus */
-
- /*
- * Check rule that every modulus must be a factor of the
- * next larger modulus. For example, if you have a bunch
- * of partitions that all have modulus 5, you can add a
- * new partition with modulus 10 or a new partition with
- * modulus 15, but you cannot add both a partition with
- * modulus 10 and a partition with modulus 15, because 10
- * is not a factor of 15.
- *
- * Get the greatest (modulus, remainder) pair contained in
- * boundinfo->datums that is less than or equal to the
- * (spec->modulus, spec->remainder) pair.
- */
- offset = partition_hash_bsearch(key, boundinfo,
- spec->modulus,
- spec->remainder);
- if (offset < 0)
- {
- next_modulus = DatumGetInt32(datums[0][0]);
- valid_modulus = (next_modulus % spec->modulus) == 0;
- }
- else
- {
- prev_modulus = DatumGetInt32(datums[offset][0]);
- valid_modulus = (spec->modulus % prev_modulus) == 0;
-
- if (valid_modulus && (offset + 1) < ndatums)
- {
- next_modulus = DatumGetInt32(datums[offset + 1][0]);
- valid_modulus = (next_modulus % spec->modulus) == 0;
- }
- }
+ /* Get appropriate Const nodes for the bounds */
+ if (ldatum->kind == PARTITION_RANGE_DATUM_VALUE)
+ *lower_val = castNode(Const, copyObject(ldatum->value));
+ else
+ *lower_val = NULL;
- if (!valid_modulus)
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("every hash partition modulus must be a factor of the next larger modulus")));
-
- greatest_modulus = get_greatest_modulus(boundinfo);
- remainder = spec->remainder;
-
- /*
- * Normally, the lowest remainder that could conflict with
- * the new partition is equal to the remainder specified
- * for the new partition, but when the new partition has a
- * modulus higher than any used so far, we need to adjust.
- */
- if (remainder >= greatest_modulus)
- remainder = remainder % greatest_modulus;
-
- /* Check every potentially-conflicting remainder. */
- do
- {
- if (boundinfo->indexes[remainder] != -1)
- {
- overlap = true;
- with = boundinfo->indexes[remainder];
- break;
- }
- remainder += spec->modulus;
- } while (remainder < greatest_modulus);
- }
-
- break;
- }
-
- case PARTITION_STRATEGY_LIST:
- {
- Assert(spec->strategy == PARTITION_STRATEGY_LIST);
-
- if (partdesc->nparts > 0)
- {
- ListCell *cell;
-
- Assert(boundinfo &&
- boundinfo->strategy == PARTITION_STRATEGY_LIST &&
- (boundinfo->ndatums > 0 ||
- partition_bound_accepts_nulls(boundinfo) ||
- partition_bound_has_default(boundinfo)));
-
- foreach(cell, spec->listdatums)
- {
- Const *val = castNode(Const, lfirst(cell));
-
- if (!val->constisnull)
- {
- int offset;
- bool equal;
-
- offset = partition_list_bsearch(key, boundinfo,
- val->constvalue,
- &equal);
- if (offset >= 0 && equal)
- {
- overlap = true;
- with = boundinfo->indexes[offset];
- break;
- }
- }
- else if (partition_bound_accepts_nulls(boundinfo))
- {
- overlap = true;
- with = boundinfo->null_index;
- break;
- }
- }
- }
-
- break;
- }
-
- case PARTITION_STRATEGY_RANGE:
- {
- PartitionRangeBound *lower,
- *upper;
-
- Assert(spec->strategy == PARTITION_STRATEGY_RANGE);
- lower = make_one_range_bound(key, -1, spec->lowerdatums, true);
- upper = make_one_range_bound(key, -1, spec->upperdatums, false);
-
- /*
- * First check if the resulting range would be empty with
- * specified lower and upper bounds
- */
- if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
- upper) >= 0)
- {
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("empty range bound specified for partition \"%s\"",
- relname),
- errdetail("Specified lower bound %s is greater than or equal to upper bound %s.",
- get_range_partbound_string(spec->lowerdatums),
- get_range_partbound_string(spec->upperdatums)),
- parser_errposition(pstate, spec->location)));
- }
-
- if (partdesc->nparts > 0)
- {
- PartitionBoundInfo boundinfo = partdesc->boundinfo;
- int offset;
- bool equal;
-
- Assert(boundinfo &&
- boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
- (boundinfo->ndatums > 0 ||
- partition_bound_has_default(boundinfo)));
-
- /*
- * Test whether the new lower bound (which is treated
- * inclusively as part of the new partition) lies inside
- * an existing partition, or in a gap.
- *
- * If it's inside an existing partition, the bound at
- * offset + 1 will be the upper bound of that partition,
- * and its index will be >= 0.
- *
- * If it's in a gap, the bound at offset + 1 will be the
- * lower bound of the next partition, and its index will
- * be -1. This is also true if there is no next partition,
- * since the index array is initialised with an extra -1
- * at the end.
- */
- offset = partition_range_bsearch(key, boundinfo, lower,
- &equal);
-
- if (boundinfo->indexes[offset + 1] < 0)
- {
- /*
- * Check that the new partition will fit in the gap.
- * For it to fit, the new upper bound must be less
- * than or equal to the lower bound of the next
- * partition, if there is one.
- */
- if (offset + 1 < boundinfo->ndatums)
- {
- int32 cmpval;
- Datum *datums;
- PartitionRangeDatumKind *kind;
- bool is_lower;
-
- datums = boundinfo->datums[offset + 1];
- kind = boundinfo->kind[offset + 1];
- is_lower = (boundinfo->indexes[offset + 1] == -1);
-
- cmpval = partition_rbound_cmp(key, datums, kind,
- is_lower, upper);
- if (cmpval < 0)
- {
- /*
- * The new partition overlaps with the
- * existing partition between offset + 1 and
- * offset + 2.
- */
- overlap = true;
- with = boundinfo->indexes[offset + 2];
- }
- }
- }
- else
- {
- /*
- * The new partition overlaps with the existing
- * partition between offset and offset + 1.
- */
- overlap = true;
- with = boundinfo->indexes[offset + 1];
- }
- }
-
- break;
- }
-
- default:
- elog(ERROR, "unexpected partition strategy: %d",
- (int) key->strategy);
- }
-
- if (overlap)
- {
- Assert(with >= 0);
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("partition \"%s\" would overlap partition \"%s\"",
- relname, get_rel_name(partdesc->oids[with])),
- parser_errposition(pstate, spec->location)));
- }
-}
-
-/*
- * check_default_allows_bound
- *
- * This function checks if there exists a row in the default partition that
- * would properly belong to the new partition being added. If it finds one,
- * it throws an error.
- */
-void
-check_default_allows_bound(Relation parent, Relation default_rel,
- PartitionBoundSpec *new_spec)
-{
- List *new_part_constraints;
- List *def_part_constraints;
- List *all_parts;
- ListCell *lc;
-
- new_part_constraints = (new_spec->strategy == PARTITION_STRATEGY_LIST)
- ? get_qual_for_list(parent, new_spec)
- : get_qual_for_range(parent, new_spec, false);
- def_part_constraints =
- get_proposed_default_constraint(new_part_constraints);
-
- /*
- * If the existing constraints on the default partition imply that it will
- * not contain any row that would belong to the new partition, we can
- * avoid scanning the default partition.
- */
- if (PartConstraintImpliedByRelConstraint(default_rel, def_part_constraints))
- {
- ereport(INFO,
- (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
- RelationGetRelationName(default_rel))));
- return;
- }
-
- /*
- * Scan the default partition and its subpartitions, and check for rows
- * that do not satisfy the revised partition constraints.
- */
- if (default_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
- all_parts = find_all_inheritors(RelationGetRelid(default_rel),
- AccessExclusiveLock, NULL);
- else
- all_parts = list_make1_oid(RelationGetRelid(default_rel));
-
- foreach(lc, all_parts)
- {
- Oid part_relid = lfirst_oid(lc);
- Relation part_rel;
- Expr *constr;
- Expr *partition_constraint;
- EState *estate;
- HeapTuple tuple;
- ExprState *partqualstate = NULL;
- Snapshot snapshot;
- TupleDesc tupdesc;
- ExprContext *econtext;
- HeapScanDesc scan;
- MemoryContext oldCxt;
- TupleTableSlot *tupslot;
-
- /* Lock already taken above. */
- if (part_relid != RelationGetRelid(default_rel))
- {
- part_rel = heap_open(part_relid, NoLock);
-
- /*
- * If the partition constraints on default partition child imply
- * that it will not contain any row that would belong to the new
- * partition, we can avoid scanning the child table.
- */
- if (PartConstraintImpliedByRelConstraint(part_rel,
- def_part_constraints))
- {
- ereport(INFO,
- (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
- RelationGetRelationName(part_rel))));
-
- heap_close(part_rel, NoLock);
- continue;
- }
- }
- else
- part_rel = default_rel;
-
- /*
- * Only RELKIND_RELATION relations (i.e. leaf partitions) need to be
- * scanned.
- */
- if (part_rel->rd_rel->relkind != RELKIND_RELATION)
- {
- if (part_rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
- ereport(WARNING,
- (errcode(ERRCODE_CHECK_VIOLATION),
- errmsg("skipped scanning foreign table \"%s\" which is a partition of default partition \"%s\"",
- RelationGetRelationName(part_rel),
- RelationGetRelationName(default_rel))));
-
- if (RelationGetRelid(default_rel) != RelationGetRelid(part_rel))
- heap_close(part_rel, NoLock);
-
- continue;
- }
-
- tupdesc = CreateTupleDescCopy(RelationGetDescr(part_rel));
- constr = linitial(def_part_constraints);
- partition_constraint = (Expr *)
- map_partition_varattnos((List *) constr,
- 1, part_rel, parent, NULL);
- estate = CreateExecutorState();
-
- /* Build expression execution states for partition check quals */
- partqualstate = ExecPrepareExpr(partition_constraint, estate);
-
- econtext = GetPerTupleExprContext(estate);
- snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = heap_beginscan(part_rel, snapshot, 0, NULL);
- tupslot = MakeSingleTupleTableSlot(tupdesc);
-
- /*
- * Switch to per-tuple memory context and reset it for each tuple
- * produced, so we don't leak memory.
- */
- oldCxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-
- while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
- {
- ExecStoreTuple(tuple, tupslot, InvalidBuffer, false);
- econtext->ecxt_scantuple = tupslot;
-
- if (!ExecCheck(partqualstate, econtext))
- ereport(ERROR,
- (errcode(ERRCODE_CHECK_VIOLATION),
- errmsg("updated partition constraint for default partition \"%s\" would be violated by some row",
- RelationGetRelationName(default_rel))));
-
- ResetExprContext(econtext);
- CHECK_FOR_INTERRUPTS();
- }
-
- MemoryContextSwitchTo(oldCxt);
- heap_endscan(scan);
- UnregisterSnapshot(snapshot);
- ExecDropSingleTupleTableSlot(tupslot);
- FreeExecutorState(estate);
-
- if (RelationGetRelid(default_rel) != RelationGetRelid(part_rel))
- heap_close(part_rel, NoLock); /* keep the lock until commit */
- }
-}
-
-/*
- * get_partition_parent
- *
- * Returns inheritance parent of a partition by scanning pg_inherits
- *
- * Note: Because this function assumes that the relation whose OID is passed
- * as an argument will have precisely one parent, it should only be called
- * when it is known that the relation is a partition.
- */
-Oid
-get_partition_parent(Oid relid)
-{
- Form_pg_inherits form;
- Relation catalogRelation;
- SysScanDesc scan;
- ScanKeyData key[2];
- HeapTuple tuple;
- Oid result;
-
- catalogRelation = heap_open(InheritsRelationId, AccessShareLock);
-
- ScanKeyInit(&key[0],
- Anum_pg_inherits_inhrelid,
- BTEqualStrategyNumber, F_OIDEQ,
- ObjectIdGetDatum(relid));
- ScanKeyInit(&key[1],
- Anum_pg_inherits_inhseqno,
- BTEqualStrategyNumber, F_INT4EQ,
- Int32GetDatum(1));
-
- scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true,
- NULL, 2, key);
-
- tuple = systable_getnext(scan);
- if (!HeapTupleIsValid(tuple))
- elog(ERROR, "could not find tuple for parent of relation %u", relid);
-
- form = (Form_pg_inherits) GETSTRUCT(tuple);
- result = form->inhparent;
-
- systable_endscan(scan);
- heap_close(catalogRelation, AccessShareLock);
-
- return result;
-}
-
-/*
- * get_qual_from_partbound
- * Given a parser node for partition bound, return the list of executable
- * expressions as partition constraint
- */
-List *
-get_qual_from_partbound(Relation rel, Relation parent,
- PartitionBoundSpec *spec)
-{
- PartitionKey key = RelationGetPartitionKey(parent);
- List *my_qual = NIL;
-
- Assert(key != NULL);
-
- switch (key->strategy)
- {
- case PARTITION_STRATEGY_HASH:
- Assert(spec->strategy == PARTITION_STRATEGY_HASH);
- my_qual = get_qual_for_hash(parent, spec);
- break;
-
- case PARTITION_STRATEGY_LIST:
- Assert(spec->strategy == PARTITION_STRATEGY_LIST);
- my_qual = get_qual_for_list(parent, spec);
- break;
-
- case PARTITION_STRATEGY_RANGE:
- Assert(spec->strategy == PARTITION_STRATEGY_RANGE);
- my_qual = get_qual_for_range(parent, spec, false);
- break;
-
- default:
- elog(ERROR, "unexpected partition strategy: %d",
- (int) key->strategy);
- }
-
- return my_qual;
-}
-
-/*
- * map_partition_varattnos - maps varattno of any Vars in expr from the
- * attno's of 'from_rel' to the attno's of 'to_rel' partition, each of which
- * may be either a leaf partition or a partitioned table, but both of which
- * must be from the same partitioning hierarchy.
- *
- * Even though all of the same column names must be present in all relations
- * in the hierarchy, and they must also have the same types, the attnos may
- * be different.
- *
- * If found_whole_row is not NULL, *found_whole_row returns whether a
- * whole-row variable was found in the input expression.
- *
- * Note: this will work on any node tree, so really the argument and result
- * should be declared "Node *". But a substantial majority of the callers
- * are working on Lists, so it's less messy to do the casts internally.
- */
-List *
-map_partition_varattnos(List *expr, int fromrel_varno,
- Relation to_rel, Relation from_rel,
- bool *found_whole_row)
-{
- bool my_found_whole_row = false;
-
- if (expr != NIL)
- {
- AttrNumber *part_attnos;
-
- part_attnos = convert_tuples_by_name_map(RelationGetDescr(to_rel),
- RelationGetDescr(from_rel),
- gettext_noop("could not convert row type"));
- expr = (List *) map_variable_attnos((Node *) expr,
- fromrel_varno, 0,
- part_attnos,
- RelationGetDescr(from_rel)->natts,
- RelationGetForm(to_rel)->reltype,
- &my_found_whole_row);
- }
-
- if (found_whole_row)
- *found_whole_row = my_found_whole_row;
-
- return expr;
-}
-
-/*
- * RelationGetPartitionQual
- *
- * Returns a list of partition quals
- */
-List *
-RelationGetPartitionQual(Relation rel)
-{
- /* Quick exit */
- if (!rel->rd_rel->relispartition)
- return NIL;
-
- return generate_partition_qual(rel);
-}
-
-/*
- * get_partition_qual_relid
- *
- * Returns an expression tree describing the passed-in relation's partition
- * constraint. If there is no partition constraint returns NULL; this can
- * happen if the default partition is the only partition.
- */
-Expr *
-get_partition_qual_relid(Oid relid)
-{
- Relation rel = heap_open(relid, AccessShareLock);
- Expr *result = NULL;
- List *and_args;
-
- /* Do the work only if this relation is a partition. */
- if (rel->rd_rel->relispartition)
- {
- and_args = generate_partition_qual(rel);
-
- if (and_args == NIL)
- result = NULL;
- else if (list_length(and_args) > 1)
- result = makeBoolExpr(AND_EXPR, and_args, -1);
- else
- result = linitial(and_args);
- }
-
- /* Keep the lock. */
- heap_close(rel, NoLock);
-
- return result;
-}
-
-/* Module-local functions */
-
-/*
- * get_partition_operator
- *
- * Return oid of the operator of given strategy for a given partition key
- * column.
- */
-static Oid
-get_partition_operator(PartitionKey key, int col, StrategyNumber strategy,
- bool *need_relabel)
-{
- Oid operoid;
-
- /*
- * First check if there exists an operator of the given strategy, with
- * this column's type as both its lefttype and righttype, in the
- * partitioning operator family specified for the column.
- */
- operoid = get_opfamily_member(key->partopfamily[col],
- key->parttypid[col],
- key->parttypid[col],
- strategy);
-
- /*
- * If one doesn't exist, we must resort to using an operator in the same
- * operator family but with the operator class declared input type. It is
- * OK to do so, because the column's type is known to be binary-coercible
- * with the operator class input type (otherwise, the operator class in
- * question would not have been accepted as the partitioning operator
- * class). We must however inform the caller to wrap the non-Const
- * expression with a RelabelType node to denote the implicit coercion. It
- * ensures that the resulting expression structurally matches similarly
- * processed expressions within the optimizer.
- */
- if (!OidIsValid(operoid))
- {
- operoid = get_opfamily_member(key->partopfamily[col],
- key->partopcintype[col],
- key->partopcintype[col],
- strategy);
- if (!OidIsValid(operoid))
- elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
- strategy, key->partopcintype[col], key->partopcintype[col],
- key->partopfamily[col]);
- *need_relabel = true;
- }
- else
- *need_relabel = false;
-
- return operoid;
-}
-
-/*
- * make_partition_op_expr
- * Returns an Expr for the given partition key column with arg1 and
- * arg2 as its leftop and rightop, respectively
- */
-static Expr *
-make_partition_op_expr(PartitionKey key, int keynum,
- uint16 strategy, Expr *arg1, Expr *arg2)
-{
- Oid operoid;
- bool need_relabel = false;
- Expr *result = NULL;
-
- /* Get the correct btree operator for this partitioning column */
- operoid = get_partition_operator(key, keynum, strategy, &need_relabel);
-
- /*
- * Chosen operator may be such that the non-Const operand needs to be
- * coerced, so apply the same; see the comment in
- * get_partition_operator().
- */
- if (!IsA(arg1, Const) &&
- (need_relabel ||
- key->partcollation[keynum] != key->parttypcoll[keynum]))
- arg1 = (Expr *) makeRelabelType(arg1,
- key->partopcintype[keynum],
- -1,
- key->partcollation[keynum],
- COERCE_EXPLICIT_CAST);
-
- /* Generate the actual expression */
- switch (key->strategy)
- {
- case PARTITION_STRATEGY_LIST:
- {
- List *elems = (List *) arg2;
- int nelems = list_length(elems);
-
- Assert(nelems >= 1);
- Assert(keynum == 0);
-
- if (nelems > 1 &&
- !type_is_array(key->parttypid[keynum]))
- {
- ArrayExpr *arrexpr;
- ScalarArrayOpExpr *saopexpr;
-
- /* Construct an ArrayExpr for the right-hand inputs */
- arrexpr = makeNode(ArrayExpr);
- arrexpr->array_typeid =
- get_array_type(key->parttypid[keynum]);
- arrexpr->array_collid = key->parttypcoll[keynum];
- arrexpr->element_typeid = key->parttypid[keynum];
- arrexpr->elements = elems;
- arrexpr->multidims = false;
- arrexpr->location = -1;
-
- /* Build leftop = ANY (rightop) */
- saopexpr = makeNode(ScalarArrayOpExpr);
- saopexpr->opno = operoid;
- saopexpr->opfuncid = get_opcode(operoid);
- saopexpr->useOr = true;
- saopexpr->inputcollid = key->partcollation[keynum];
- saopexpr->args = list_make2(arg1, arrexpr);
- saopexpr->location = -1;
-
- result = (Expr *) saopexpr;
- }
- else
- {
- List *elemops = NIL;
- ListCell *lc;
-
- foreach (lc, elems)
- {
- Expr *elem = lfirst(lc),
- *elemop;
-
- elemop = make_opclause(operoid,
- BOOLOID,
- false,
- arg1, elem,
- InvalidOid,
- key->partcollation[keynum]);
- elemops = lappend(elemops, elemop);
- }
-
- result = nelems > 1 ? makeBoolExpr(OR_EXPR, elemops, -1) : linitial(elemops);
- }
- break;
- }
-
- case PARTITION_STRATEGY_RANGE:
- result = make_opclause(operoid,
- BOOLOID,
- false,
- arg1, arg2,
- InvalidOid,
- key->partcollation[keynum]);
- break;
-
- default:
- elog(ERROR, "invalid partitioning strategy");
- break;
- }
-
- return result;
-}
-
-/*
- * get_qual_for_hash
- *
- * Given a list of partition columns, modulus and remainder corresponding to a
- * partition, this function returns CHECK constraint expression Node for that
- * partition.
- *
- * The partition constraint for a hash partition is always a call to the
- * built-in function satisfies_hash_partition(). The first two arguments are
- * the modulus and remainder for the partition; the remaining arguments are the
- * values to be hashed.
- */
-static List *
-get_qual_for_hash(Relation parent, PartitionBoundSpec *spec)
-{
- PartitionKey key = RelationGetPartitionKey(parent);
- FuncExpr *fexpr;
- Node *relidConst;
- Node *modulusConst;
- Node *remainderConst;
- List *args;
- ListCell *partexprs_item;
- int i;
-
- /* Fixed arguments. */
- relidConst = (Node *) makeConst(OIDOID,
- -1,
- InvalidOid,
- sizeof(Oid),
- ObjectIdGetDatum(RelationGetRelid(parent)),
- false,
- true);
-
- modulusConst = (Node *) makeConst(INT4OID,
- -1,
- InvalidOid,
- sizeof(int32),
- Int32GetDatum(spec->modulus),
- false,
- true);
-
- remainderConst = (Node *) makeConst(INT4OID,
- -1,
- InvalidOid,
- sizeof(int32),
- Int32GetDatum(spec->remainder),
- false,
- true);
-
- args = list_make3(relidConst, modulusConst, remainderConst);
- partexprs_item = list_head(key->partexprs);
-
- /* Add an argument for each key column. */
- for (i = 0; i < key->partnatts; i++)
- {
- Node *keyCol;
-
- /* Left operand */
- if (key->partattrs[i] != 0)
- {
- keyCol = (Node *) makeVar(1,
- key->partattrs[i],
- key->parttypid[i],
- key->parttypmod[i],
- key->parttypcoll[i],
- 0);
- }
- else
- {
- keyCol = (Node *) copyObject(lfirst(partexprs_item));
- partexprs_item = lnext(partexprs_item);
- }
-
- args = lappend(args, keyCol);
- }
-
- fexpr = makeFuncExpr(F_SATISFIES_HASH_PARTITION,
- BOOLOID,
- args,
- InvalidOid,
- InvalidOid,
- COERCE_EXPLICIT_CALL);
-
- return list_make1(fexpr);
-}
-
-/*
- * get_qual_for_list
- *
- * Returns an implicit-AND list of expressions to use as a list partition's
- * constraint, given the partition key and bound structures.
- *
- * The function returns NIL for a default partition when it's the only
- * partition since in that case there is no constraint.
- */
-static List *
-get_qual_for_list(Relation parent, PartitionBoundSpec *spec)
-{
- PartitionKey key = RelationGetPartitionKey(parent);
- List *result;
- Expr *keyCol;
- Expr *opexpr;
- NullTest *nulltest;
- ListCell *cell;
- List *elems = NIL;
- bool list_has_null = false;
-
- /*
- * Only single-column list partitioning is supported, so we are worried
- * only about the partition key with index 0.
- */
- Assert(key->partnatts == 1);
-
- /* Construct Var or expression representing the partition column */
- if (key->partattrs[0] != 0)
- keyCol = (Expr *) makeVar(1,
- key->partattrs[0],
- key->parttypid[0],
- key->parttypmod[0],
- key->parttypcoll[0],
- 0);
- else
- keyCol = (Expr *) copyObject(linitial(key->partexprs));
-
- /*
- * For default list partition, collect datums for all the partitions. The
- * default partition constraint should check that the partition key is
- * equal to none of those.
- */
- if (spec->is_default)
- {
- int i;
- int ndatums = 0;
- PartitionDesc pdesc = RelationGetPartitionDesc(parent);
- PartitionBoundInfo boundinfo = pdesc->boundinfo;
-
- if (boundinfo)
- {
- ndatums = boundinfo->ndatums;
-
- if (partition_bound_accepts_nulls(boundinfo))
- list_has_null = true;
- }
-
- /*
- * If default is the only partition, there need not be any partition
- * constraint on it.
- */
- if (ndatums == 0 && !list_has_null)
- return NIL;
-
- for (i = 0; i < ndatums; i++)
- {
- Const *val;
-
- /*
- * Construct Const from known-not-null datum. We must be careful
- * to copy the value, because our result has to be able to outlive
- * the relcache entry we're copying from.
- */
- val = makeConst(key->parttypid[0],
- key->parttypmod[0],
- key->parttypcoll[0],
- key->parttyplen[0],
- datumCopy(*boundinfo->datums[i],
- key->parttypbyval[0],
- key->parttyplen[0]),
- false, /* isnull */
- key->parttypbyval[0]);
-
- elems = lappend(elems, val);
- }
- }
- else
- {
- /*
- * Create list of Consts for the allowed values, excluding any nulls.
- */
- foreach(cell, spec->listdatums)
- {
- Const *val = castNode(Const, lfirst(cell));
-
- if (val->constisnull)
- list_has_null = true;
- else
- elems = lappend(elems, copyObject(val));
- }
- }
-
- if (elems)
- {
- /*
- * Generate the operator expression from the non-null partition
- * values.
- */
- opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber,
- keyCol, (Expr *) elems);
- }
- else
- {
- /*
- * If there are no partition values, we don't need an operator
- * expression.
- */
- opexpr = NULL;
- }
-
- if (!list_has_null)
- {
- /*
- * Gin up a "col IS NOT NULL" test that will be AND'd with the main
- * expression. This might seem redundant, but the partition routing
- * machinery needs it.
- */
- nulltest = makeNode(NullTest);
- nulltest->arg = keyCol;
- nulltest->nulltesttype = IS_NOT_NULL;
- nulltest->argisrow = false;
- nulltest->location = -1;
-
- result = opexpr ? list_make2(nulltest, opexpr) : list_make1(nulltest);
- }
- else
- {
- /*
- * Gin up a "col IS NULL" test that will be OR'd with the main
- * expression.
- */
- nulltest = makeNode(NullTest);
- nulltest->arg = keyCol;
- nulltest->nulltesttype = IS_NULL;
- nulltest->argisrow = false;
- nulltest->location = -1;
-
- if (opexpr)
- {
- Expr *or;
-
- or = makeBoolExpr(OR_EXPR, list_make2(nulltest, opexpr), -1);
- result = list_make1(or);
- }
- else
- result = list_make1(nulltest);
- }
-
- /*
- * Note that, in general, applying NOT to a constraint expression doesn't
- * necessarily invert the set of rows it accepts, because NOT (NULL) is
- * NULL. However, the partition constraints we construct here never
- * evaluate to NULL, so applying NOT works as intended.
- */
- if (spec->is_default)
- {
- result = list_make1(make_ands_explicit(result));
- result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
- }
-
- return result;
-}
-
-/*
- * get_range_key_properties
- * Returns range partition key information for a given column
- *
- * This is a subroutine for get_qual_for_range, and its API is pretty
- * specialized to that caller.
- *
- * Constructs an Expr for the key column (returned in *keyCol) and Consts
- * for the lower and upper range limits (returned in *lower_val and
- * *upper_val). For MINVALUE/MAXVALUE limits, NULL is returned instead of
- * a Const. All of these structures are freshly palloc'd.
- *
- * *partexprs_item points to the cell containing the next expression in
- * the key->partexprs list, or NULL. It may be advanced upon return.
- */
-static void
-get_range_key_properties(PartitionKey key, int keynum,
- PartitionRangeDatum *ldatum,
- PartitionRangeDatum *udatum,
- ListCell **partexprs_item,
- Expr **keyCol,
- Const **lower_val, Const **upper_val)
-{
- /* Get partition key expression for this column */
- if (key->partattrs[keynum] != 0)
- {
- *keyCol = (Expr *) makeVar(1,
- key->partattrs[keynum],
- key->parttypid[keynum],
- key->parttypmod[keynum],
- key->parttypcoll[keynum],
- 0);
- }
- else
- {
- if (*partexprs_item == NULL)
- elog(ERROR, "wrong number of partition key expressions");
- *keyCol = copyObject(lfirst(*partexprs_item));
- *partexprs_item = lnext(*partexprs_item);
- }
-
- /* Get appropriate Const nodes for the bounds */
- if (ldatum->kind == PARTITION_RANGE_DATUM_VALUE)
- *lower_val = castNode(Const, copyObject(ldatum->value));
- else
- *lower_val = NULL;
-
- if (udatum->kind == PARTITION_RANGE_DATUM_VALUE)
- *upper_val = castNode(Const, copyObject(udatum->value));
- else
- *upper_val = NULL;
-}
-
- /*
- * get_range_nulltest
- *
- * A non-default range partition table does not currently allow partition
- * keys to be null, so emit an IS NOT NULL expression for each key column.
- */
-static List *
-get_range_nulltest(PartitionKey key)
-{
- List *result = NIL;
- NullTest *nulltest;
- ListCell *partexprs_item;
- int i;
-
- partexprs_item = list_head(key->partexprs);
- for (i = 0; i < key->partnatts; i++)
- {
- Expr *keyCol;
-
- if (key->partattrs[i] != 0)
- {
- keyCol = (Expr *) makeVar(1,
- key->partattrs[i],
- key->parttypid[i],
- key->parttypmod[i],
- key->parttypcoll[i],
- 0);
- }
- else
- {
- if (partexprs_item == NULL)
- elog(ERROR, "wrong number of partition key expressions");
- keyCol = copyObject(lfirst(partexprs_item));
- partexprs_item = lnext(partexprs_item);
- }
-
- nulltest = makeNode(NullTest);
- nulltest->arg = keyCol;
- nulltest->nulltesttype = IS_NOT_NULL;
- nulltest->argisrow = false;
- nulltest->location = -1;
- result = lappend(result, nulltest);
- }
-
- return result;
-}
-
-/*
- * get_qual_for_range
- *
- * Returns an implicit-AND list of expressions to use as a range partition's
- * constraint, given the partition key and bound structures.
- *
- * For a multi-column range partition key, say (a, b, c), with (al, bl, cl)
- * as the lower bound tuple and (au, bu, cu) as the upper bound tuple, we
- * generate an expression tree of the following form:
- *
- * (a IS NOT NULL) and (b IS NOT NULL) and (c IS NOT NULL)
- * AND
- * (a > al OR (a = al AND b > bl) OR (a = al AND b = bl AND c >= cl))
- * AND
- * (a < au OR (a = au AND b < bu) OR (a = au AND b = bu AND c < cu))
- *
- * It is often the case that a prefix of lower and upper bound tuples contains
- * the same values, for example, (al = au), in which case, we will emit an
- * expression tree of the following form:
- *
- * (a IS NOT NULL) and (b IS NOT NULL) and (c IS NOT NULL)
- * AND
- * (a = al)
- * AND
- * (b > bl OR (b = bl AND c >= cl))
- * AND
- * (b < bu) OR (b = bu AND c < cu))
- *
- * If a bound datum is either MINVALUE or MAXVALUE, these expressions are
- * simplified using the fact that any value is greater than MINVALUE and less
- * than MAXVALUE. So, for example, if cu = MAXVALUE, c < cu is automatically
- * true, and we need not emit any expression for it, and the last line becomes
- *
- * (b < bu) OR (b = bu), which is simplified to (b <= bu)
- *
- * In most common cases with only one partition column, say a, the following
- * expression tree will be generated: a IS NOT NULL AND a >= al AND a < au
- *
- * For default partition, it returns the negation of the constraints of all
- * the other partitions.
- *
- * External callers should pass for_default as false; we set it to true only
- * when recursing.
- */
-static List *
-get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
- bool for_default)
-{
- List *result = NIL;
- ListCell *cell1,
- *cell2,
- *partexprs_item,
- *partexprs_item_saved;
- int i,
- j;
- PartitionRangeDatum *ldatum,
- *udatum;
- PartitionKey key = RelationGetPartitionKey(parent);
- Expr *keyCol;
- Const *lower_val,
- *upper_val;
- List *lower_or_arms,
- *upper_or_arms;
- int num_or_arms,
- current_or_arm;
- ListCell *lower_or_start_datum,
- *upper_or_start_datum;
- bool need_next_lower_arm,
- need_next_upper_arm;
-
- if (spec->is_default)
- {
- List *or_expr_args = NIL;
- PartitionDesc pdesc = RelationGetPartitionDesc(parent);
- Oid *inhoids = pdesc->oids;
- int nparts = pdesc->nparts,
- i;
-
- for (i = 0; i < nparts; i++)
- {
- Oid inhrelid = inhoids[i];
- HeapTuple tuple;
- Datum datum;
- bool isnull;
- PartitionBoundSpec *bspec;
-
- tuple = SearchSysCache1(RELOID, inhrelid);
- if (!HeapTupleIsValid(tuple))
- elog(ERROR, "cache lookup failed for relation %u", inhrelid);
-
- datum = SysCacheGetAttr(RELOID, tuple,
- Anum_pg_class_relpartbound,
- &isnull);
-
- Assert(!isnull);
- bspec = (PartitionBoundSpec *)
- stringToNode(TextDatumGetCString(datum));
- if (!IsA(bspec, PartitionBoundSpec))
- elog(ERROR, "expected PartitionBoundSpec");
-
- if (!bspec->is_default)
- {
- List *part_qual;
-
- part_qual = get_qual_for_range(parent, bspec, true);
-
- /*
- * AND the constraints of the partition and add to
- * or_expr_args
- */
- or_expr_args = lappend(or_expr_args, list_length(part_qual) > 1
- ? makeBoolExpr(AND_EXPR, part_qual, -1)
- : linitial(part_qual));
- }
- ReleaseSysCache(tuple);
- }
-
- if (or_expr_args != NIL)
- {
- Expr *other_parts_constr;
-
- /*
- * Combine the constraints obtained for non-default partitions
- * using OR. As requested, each of the OR's args doesn't include
- * the NOT NULL test for partition keys (which is to avoid its
- * useless repetition). Add the same now.
- */
- other_parts_constr =
- makeBoolExpr(AND_EXPR,
- lappend(get_range_nulltest(key),
- list_length(or_expr_args) > 1
- ? makeBoolExpr(OR_EXPR, or_expr_args,
- -1)
- : linitial(or_expr_args)),
- -1);
-
- /*
- * Finally, the default partition contains everything *NOT*
- * contained in the non-default partitions.
- */
- result = list_make1(makeBoolExpr(NOT_EXPR,
- list_make1(other_parts_constr), -1));
- }
-
- return result;
- }
-
- lower_or_start_datum = list_head(spec->lowerdatums);
- upper_or_start_datum = list_head(spec->upperdatums);
- num_or_arms = key->partnatts;
-
- /*
- * If it is the recursive call for default, we skip the get_range_nulltest
- * to avoid accumulating the NullTest on the same keys for each partition.
- */
- if (!for_default)
- result = get_range_nulltest(key);
-
- /*
- * Iterate over the key columns and check if the corresponding lower and
- * upper datums are equal using the btree equality operator for the
- * column's type. If equal, we emit single keyCol = common_value
- * expression. Starting from the first column for which the corresponding
- * lower and upper bound datums are not equal, we generate OR expressions
- * as shown in the function's header comment.
- */
- i = 0;
- partexprs_item = list_head(key->partexprs);
- partexprs_item_saved = partexprs_item; /* placate compiler */
- forboth(cell1, spec->lowerdatums, cell2, spec->upperdatums)
- {
- EState *estate;
- MemoryContext oldcxt;
- Expr *test_expr;
- ExprState *test_exprstate;
- Datum test_result;
- bool isNull;
-
- ldatum = castNode(PartitionRangeDatum, lfirst(cell1));
- udatum = castNode(PartitionRangeDatum, lfirst(cell2));
-
- /*
- * Since get_range_key_properties() modifies partexprs_item, and we
- * might need to start over from the previous expression in the later
- * part of this function, save away the current value.
- */
- partexprs_item_saved = partexprs_item;
-
- get_range_key_properties(key, i, ldatum, udatum,
- &partexprs_item,
- &keyCol,
- &lower_val, &upper_val);
-
- /*
- * If either value is NULL, the corresponding partition bound is
- * either MINVALUE or MAXVALUE, and we treat them as unequal, because
- * even if they're the same, there is no common value to equate the
- * key column with.
- */
- if (!lower_val || !upper_val)
- break;
-
- /* Create the test expression */
- estate = CreateExecutorState();
- oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
- test_expr = make_partition_op_expr(key, i, BTEqualStrategyNumber,
- (Expr *) lower_val,
- (Expr *) upper_val);
- fix_opfuncids((Node *) test_expr);
- test_exprstate = ExecInitExpr(test_expr, NULL);
- test_result = ExecEvalExprSwitchContext(test_exprstate,
- GetPerTupleExprContext(estate),
- &isNull);
- MemoryContextSwitchTo(oldcxt);
- FreeExecutorState(estate);
-
- /* If not equal, go generate the OR expressions */
- if (!DatumGetBool(test_result))
- break;
-
- /*
- * The bounds for the last key column can't be equal, because such a
- * range partition would never be allowed to be defined (it would have
- * an empty range otherwise).
- */
- if (i == key->partnatts - 1)
- elog(ERROR, "invalid range bound specification");
-
- /* Equal, so generate keyCol = lower_val expression */
- result = lappend(result,
- make_partition_op_expr(key, i, BTEqualStrategyNumber,
- keyCol, (Expr *) lower_val));
-
- i++;
- }
-
- /* First pair of lower_val and upper_val that are not equal. */
- lower_or_start_datum = cell1;
- upper_or_start_datum = cell2;
-
- /* OR will have as many arms as there are key columns left. */
- num_or_arms = key->partnatts - i;
- current_or_arm = 0;
- lower_or_arms = upper_or_arms = NIL;
- need_next_lower_arm = need_next_upper_arm = true;
- while (current_or_arm < num_or_arms)
- {
- List *lower_or_arm_args = NIL,
- *upper_or_arm_args = NIL;
-
- /* Restart scan of columns from the i'th one */
- j = i;
- partexprs_item = partexprs_item_saved;
-
- for_both_cell(cell1, lower_or_start_datum, cell2, upper_or_start_datum)
- {
- PartitionRangeDatum *ldatum_next = NULL,
- *udatum_next = NULL;
-
- ldatum = castNode(PartitionRangeDatum, lfirst(cell1));
- if (lnext(cell1))
- ldatum_next = castNode(PartitionRangeDatum,
- lfirst(lnext(cell1)));
- udatum = castNode(PartitionRangeDatum, lfirst(cell2));
- if (lnext(cell2))
- udatum_next = castNode(PartitionRangeDatum,
- lfirst(lnext(cell2)));
- get_range_key_properties(key, j, ldatum, udatum,
- &partexprs_item,
- &keyCol,
- &lower_val, &upper_val);
-
- if (need_next_lower_arm && lower_val)
- {
- uint16 strategy;
-
- /*
- * For the non-last columns of this arm, use the EQ operator.
- * For the last column of this arm, use GT, unless this is the
- * last column of the whole bound check, or the next bound
- * datum is MINVALUE, in which case use GE.
- */
- if (j - i < current_or_arm)
- strategy = BTEqualStrategyNumber;
- else if (j == key->partnatts - 1 ||
- (ldatum_next &&
- ldatum_next->kind == PARTITION_RANGE_DATUM_MINVALUE))
- strategy = BTGreaterEqualStrategyNumber;
- else
- strategy = BTGreaterStrategyNumber;
-
- lower_or_arm_args = lappend(lower_or_arm_args,
- make_partition_op_expr(key, j,
- strategy,
- keyCol,
- (Expr *) lower_val));
- }
-
- if (need_next_upper_arm && upper_val)
- {
- uint16 strategy;
-
- /*
- * For the non-last columns of this arm, use the EQ operator.
- * For the last column of this arm, use LT, unless the next
- * bound datum is MAXVALUE, in which case use LE.
- */
- if (j - i < current_or_arm)
- strategy = BTEqualStrategyNumber;
- else if (udatum_next &&
- udatum_next->kind == PARTITION_RANGE_DATUM_MAXVALUE)
- strategy = BTLessEqualStrategyNumber;
- else
- strategy = BTLessStrategyNumber;
-
- upper_or_arm_args = lappend(upper_or_arm_args,
- make_partition_op_expr(key, j,
- strategy,
- keyCol,
- (Expr *) upper_val));
- }
-
- /*
- * Did we generate enough of OR's arguments? First arm considers
- * the first of the remaining columns, second arm considers first
- * two of the remaining columns, and so on.
- */
- ++j;
- if (j - i > current_or_arm)
- {
- /*
- * We must not emit any more arms if the new column that will
- * be considered is unbounded, or this one was.
- */
- if (!lower_val || !ldatum_next ||
- ldatum_next->kind != PARTITION_RANGE_DATUM_VALUE)
- need_next_lower_arm = false;
- if (!upper_val || !udatum_next ||
- udatum_next->kind != PARTITION_RANGE_DATUM_VALUE)
- need_next_upper_arm = false;
- break;
- }
- }
-
- if (lower_or_arm_args != NIL)
- lower_or_arms = lappend(lower_or_arms,
- list_length(lower_or_arm_args) > 1
- ? makeBoolExpr(AND_EXPR, lower_or_arm_args, -1)
- : linitial(lower_or_arm_args));
-
- if (upper_or_arm_args != NIL)
- upper_or_arms = lappend(upper_or_arms,
- list_length(upper_or_arm_args) > 1
- ? makeBoolExpr(AND_EXPR, upper_or_arm_args, -1)
- : linitial(upper_or_arm_args));
-
- /* If no work to do in the next iteration, break away. */
- if (!need_next_lower_arm && !need_next_upper_arm)
- break;
-
- ++current_or_arm;
- }
-
- /*
- * Generate the OR expressions for each of lower and upper bounds (if
- * required), and append to the list of implicitly ANDed list of
- * expressions.
- */
- if (lower_or_arms != NIL)
- result = lappend(result,
- list_length(lower_or_arms) > 1
- ? makeBoolExpr(OR_EXPR, lower_or_arms, -1)
- : linitial(lower_or_arms));
- if (upper_or_arms != NIL)
- result = lappend(result,
- list_length(upper_or_arms) > 1
- ? makeBoolExpr(OR_EXPR, upper_or_arms, -1)
- : linitial(upper_or_arms));
-
- /*
- * As noted above, for non-default, we return list with constant TRUE. If
- * the result is NIL during the recursive call for default, it implies
- * this is the only other partition which can hold every value of the key
- * except NULL. Hence we return the NullTest result skipped earlier.
- */
- if (result == NIL)
- result = for_default
- ? get_range_nulltest(key)
- : list_make1(makeBoolConst(true, false));
-
- return result;
-}
-
-/*
- * generate_partition_qual
- *
- * Generate partition predicate from rel's partition bound expression. The
- * function returns a NIL list if there is no predicate.
- *
- * Result expression tree is stored CacheMemoryContext to ensure it survives
- * as long as the relcache entry. But we should be running in a less long-lived
- * working context. To avoid leaking cache memory if this routine fails partway
- * through, we build in working memory and then copy the completed structure
- * into cache memory.
- */
-static List *
-generate_partition_qual(Relation rel)
-{
- HeapTuple tuple;
- MemoryContext oldcxt;
- Datum boundDatum;
- bool isnull;
- PartitionBoundSpec *bound;
- List *my_qual = NIL,
- *result = NIL;
- Relation parent;
- bool found_whole_row;
-
- /* Guard against stack overflow due to overly deep partition tree */
- check_stack_depth();
-
- /* Quick copy */
- if (rel->rd_partcheck != NIL)
- return copyObject(rel->rd_partcheck);
-
- /* Grab at least an AccessShareLock on the parent table */
- parent = heap_open(get_partition_parent(RelationGetRelid(rel)),
- AccessShareLock);
-
- /* Get pg_class.relpartbound */
- tuple = SearchSysCache1(RELOID, RelationGetRelid(rel));
- if (!HeapTupleIsValid(tuple))
- elog(ERROR, "cache lookup failed for relation %u",
- RelationGetRelid(rel));
-
- boundDatum = SysCacheGetAttr(RELOID, tuple,
- Anum_pg_class_relpartbound,
- &isnull);
- if (isnull) /* should not happen */
- elog(ERROR, "relation \"%s\" has relpartbound = null",
- RelationGetRelationName(rel));
- bound = castNode(PartitionBoundSpec,
- stringToNode(TextDatumGetCString(boundDatum)));
- ReleaseSysCache(tuple);
-
- my_qual = get_qual_from_partbound(rel, parent, bound);
-
- /* Add the parent's quals to the list (if any) */
- if (parent->rd_rel->relispartition)
- result = list_concat(generate_partition_qual(parent), my_qual);
+ if (udatum->kind == PARTITION_RANGE_DATUM_VALUE)
+ *upper_val = castNode(Const, copyObject(udatum->value));
else
- result = my_qual;
-
- /*
- * Change Vars to have partition's attnos instead of the parent's. We do
- * this after we concatenate the parent's quals, because we want every Var
- * in it to bear this relation's attnos. It's safe to assume varno = 1
- * here.
- */
- result = map_partition_varattnos(result, 1, rel, parent,
- &found_whole_row);
- /* There can never be a whole-row reference here */
- if (found_whole_row)
- elog(ERROR, "unexpected whole-row reference found in partition key");
-
- /* Save a copy in the relcache */
- oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
- rel->rd_partcheck = copyObject(result);
- MemoryContextSwitchTo(oldcxt);
-
- /* Keep the parent locked until commit */
- heap_close(parent, NoLock);
-
- return result;
-}
-
-/*
- * get_partition_for_tuple
- * Finds partition of relation which accepts the partition key specified
- * in values and isnull
- *
- * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
- * found or -1 if none found.
- */
-int
-get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
-{
- int bound_offset;
- int part_index = -1;
- PartitionKey key = RelationGetPartitionKey(relation);
- PartitionDesc partdesc = RelationGetPartitionDesc(relation);
-
- /* Route as appropriate based on partitioning strategy. */
- switch (key->strategy)
- {
- case PARTITION_STRATEGY_HASH:
- {
- PartitionBoundInfo boundinfo = partdesc->boundinfo;
- int greatest_modulus = get_greatest_modulus(boundinfo);
- uint64 rowHash = compute_hash_value(key, values, isnull);
-
- part_index = boundinfo->indexes[rowHash % greatest_modulus];
- }
- break;
-
- case PARTITION_STRATEGY_LIST:
- if (isnull[0])
- {
- if (partition_bound_accepts_nulls(partdesc->boundinfo))
- part_index = partdesc->boundinfo->null_index;
- }
- else
- {
- bool equal = false;
-
- bound_offset = partition_list_bsearch(key,
- partdesc->boundinfo,
- values[0], &equal);
- if (bound_offset >= 0 && equal)
- part_index = partdesc->boundinfo->indexes[bound_offset];
- }
- break;
-
- case PARTITION_STRATEGY_RANGE:
- {
- bool equal = false,
- range_partkey_has_null = false;
- int i;
-
- /*
- * No range includes NULL, so this will be accepted by the
- * default partition if there is one, and otherwise rejected.
- */
- for (i = 0; i < key->partnatts; i++)
- {
- if (isnull[i])
- {
- range_partkey_has_null = true;
- break;
- }
- }
-
- if (!range_partkey_has_null)
- {
- bound_offset = partition_range_datum_bsearch(key,
- partdesc->boundinfo,
- key->partnatts,
- values,
- &equal);
- /*
- * The bound at bound_offset is less than or equal to the
- * tuple value, so the bound at offset+1 is the upper
- * bound of the partition we're looking for, if there
- * actually exists one.
- */
- part_index = partdesc->boundinfo->indexes[bound_offset + 1];
- }
- }
- break;
-
- default:
- elog(ERROR, "unexpected partition strategy: %d",
- (int) key->strategy);
- }
-
- /*
- * part_index < 0 means we failed to find a partition of this parent. Use
- * the default partition, if there is one.
- */
- if (part_index < 0)
- part_index = partdesc->boundinfo->default_index;
-
- return part_index;
+ *upper_val = NULL;
}
-/*
- * Checks if any of the 'attnums' is a partition key attribute for rel
- *
- * Sets *used_in_expr if any of the 'attnums' is found to be referenced in some
- * partition key expression. It's possible for a column to be both used
- * directly and as part of an expression; if that happens, *used_in_expr may
- * end up as either true or false. That's OK for current uses of this
- * function, because *used_in_expr is only used to tailor the error message
- * text.
- */
-bool
-has_partition_attrs(Relation rel, Bitmapset *attnums,
- bool *used_in_expr)
+ /*
+ * get_range_nulltest
+ *
+ * A non-default range partition table does not currently allow partition
+ * keys to be null, so emit an IS NOT NULL expression for each key column.
+ */
+static List *
+get_range_nulltest(PartitionKey key)
{
- PartitionKey key;
- int partnatts;
- List *partexprs;
+ List *result = NIL;
+ NullTest *nulltest;
ListCell *partexprs_item;
int i;
- if (attnums == NULL || rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
- return false;
-
- key = RelationGetPartitionKey(rel);
- partnatts = get_partition_natts(key);
- partexprs = get_partition_exprs(key);
-
- partexprs_item = list_head(partexprs);
- for (i = 0; i < partnatts; i++)
+ partexprs_item = list_head(key->partexprs);
+ for (i = 0; i < key->partnatts; i++)
{
- AttrNumber partattno = get_partition_col_attnum(key, i);
+ Expr *keyCol;
- if (partattno != 0)
+ if (key->partattrs[i] != 0)
{
- if (bms_is_member(partattno - FirstLowInvalidHeapAttributeNumber,
- attnums))
- {
- if (used_in_expr)
- *used_in_expr = false;
- return true;
- }
+ keyCol = (Expr *) makeVar(1,
+ key->partattrs[i],
+ key->parttypid[i],
+ key->parttypmod[i],
+ key->parttypcoll[i],
+ 0);
}
else
{
- /* Arbitrary expression */
- Node *expr = (Node *) lfirst(partexprs_item);
- Bitmapset *expr_attrs = NULL;
-
- /* Find all attributes referenced */
- pull_varattnos(expr, 1, &expr_attrs);
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+ keyCol = copyObject(lfirst(partexprs_item));
partexprs_item = lnext(partexprs_item);
-
- if (bms_overlap(attnums, expr_attrs))
- {
- if (used_in_expr)
- *used_in_expr = true;
- return true;
- }
}
+
+ nulltest = makeNode(NullTest);
+ nulltest->arg = keyCol;
+ nulltest->nulltesttype = IS_NOT_NULL;
+ nulltest->argisrow = false;
+ nulltest->location = -1;
+ result = lappend(result, nulltest);
}
- return false;
+ return result;
}
/*
- * qsort_partition_hbound_cmp
+ * get_qual_for_range
*
- * We sort hash bounds by modulus, then by remainder.
- */
-static int32
-qsort_partition_hbound_cmp(const void *a, const void *b)
-{
- PartitionHashBound *h1 = (*(PartitionHashBound *const *) a);
- PartitionHashBound *h2 = (*(PartitionHashBound *const *) b);
-
- return partition_hbound_cmp(h1->modulus, h1->remainder,
- h2->modulus, h2->remainder);
-}
-
-/*
- * partition_hbound_cmp
+ * Returns an implicit-AND list of expressions to use as a range partition's
+ * constraint, given the partition key and bound structures.
*
- * Compares modulus first, then remainder if modulus are equal.
- */
-static int32
-partition_hbound_cmp(int modulus1, int remainder1, int modulus2, int remainder2)
-{
- if (modulus1 < modulus2)
- return -1;
- if (modulus1 > modulus2)
- return 1;
- if (modulus1 == modulus2 && remainder1 != remainder2)
- return (remainder1 > remainder2) ? 1 : -1;
- return 0;
-}
-
-/*
- * qsort_partition_list_value_cmp
+ * For a multi-column range partition key, say (a, b, c), with (al, bl, cl)
+ * as the lower bound tuple and (au, bu, cu) as the upper bound tuple, we
+ * generate an expression tree of the following form:
+ *
+ * (a IS NOT NULL) and (b IS NOT NULL) and (c IS NOT NULL)
+ * AND
+ * (a > al OR (a = al AND b > bl) OR (a = al AND b = bl AND c >= cl))
+ * AND
+ * (a < au OR (a = au AND b < bu) OR (a = au AND b = bu AND c < cu))
+ *
+ * It is often the case that a prefix of lower and upper bound tuples contains
+ * the same values, for example, (al = au), in which case, we will emit an
+ * expression tree of the following form:
+ *
+ * (a IS NOT NULL) and (b IS NOT NULL) and (c IS NOT NULL)
+ * AND
+ * (a = al)
+ * AND
+ * (b > bl OR (b = bl AND c >= cl))
+ * AND
+ * (b < bu) OR (b = bu AND c < cu))
+ *
+ * If a bound datum is either MINVALUE or MAXVALUE, these expressions are
+ * simplified using the fact that any value is greater than MINVALUE and less
+ * than MAXVALUE. So, for example, if cu = MAXVALUE, c < cu is automatically
+ * true, and we need not emit any expression for it, and the last line becomes
+ *
+ * (b < bu) OR (b = bu), which is simplified to (b <= bu)
+ *
+ * In most common cases with only one partition column, say a, the following
+ * expression tree will be generated: a IS NOT NULL AND a >= al AND a < au
*
- * Compare two list partition bound datums
+ * For default partition, it returns the negation of the constraints of all
+ * the other partitions.
+ *
+ * External callers should pass for_default as false; we set it to true only
+ * when recursing.
*/
-static int32
-qsort_partition_list_value_cmp(const void *a, const void *b, void *arg)
+static List *
+get_qual_for_range(Relation parent, PartitionBoundSpec *spec,
+ bool for_default)
{
- Datum val1 = (*(const PartitionListValue **) a)->value,
- val2 = (*(const PartitionListValue **) b)->value;
- PartitionKey key = (PartitionKey) arg;
+ List *result = NIL;
+ ListCell *cell1,
+ *cell2,
+ *partexprs_item,
+ *partexprs_item_saved;
+ int i,
+ j;
+ PartitionRangeDatum *ldatum,
+ *udatum;
+ PartitionKey key = RelationGetPartitionKey(parent);
+ Expr *keyCol;
+ Const *lower_val,
+ *upper_val;
+ List *lower_or_arms,
+ *upper_or_arms;
+ int num_or_arms,
+ current_or_arm;
+ ListCell *lower_or_start_datum,
+ *upper_or_start_datum;
+ bool need_next_lower_arm,
+ need_next_upper_arm;
- return DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- val1, val2));
-}
+ if (spec->is_default)
+ {
+ List *or_expr_args = NIL;
+ PartitionDesc pdesc = RelationGetPartitionDesc(parent);
+ Oid *inhoids = pdesc->oids;
+ int nparts = pdesc->nparts,
+ i;
-/*
- * make_one_range_bound
- *
- * Return a PartitionRangeBound given a list of PartitionRangeDatum elements
- * and a flag telling whether the bound is lower or not. Made into a function
- * because there are multiple sites that want to use this facility.
- */
-static PartitionRangeBound *
-make_one_range_bound(PartitionKey key, int index, List *datums, bool lower)
-{
- PartitionRangeBound *bound;
- ListCell *lc;
- int i;
+ for (i = 0; i < nparts; i++)
+ {
+ Oid inhrelid = inhoids[i];
+ HeapTuple tuple;
+ Datum datum;
+ bool isnull;
+ PartitionBoundSpec *bspec;
- Assert(datums != NIL);
+ tuple = SearchSysCache1(RELOID, inhrelid);
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for relation %u", inhrelid);
- bound = (PartitionRangeBound *) palloc0(sizeof(PartitionRangeBound));
- bound->index = index;
- bound->datums = (Datum *) palloc0(key->partnatts * sizeof(Datum));
- bound->kind = (PartitionRangeDatumKind *) palloc0(key->partnatts *
- sizeof(PartitionRangeDatumKind));
- bound->lower = lower;
+ datum = SysCacheGetAttr(RELOID, tuple,
+ Anum_pg_class_relpartbound,
+ &isnull);
- i = 0;
- foreach(lc, datums)
- {
- PartitionRangeDatum *datum = castNode(PartitionRangeDatum, lfirst(lc));
+ Assert(!isnull);
+ bspec = (PartitionBoundSpec *)
+ stringToNode(TextDatumGetCString(datum));
+ if (!IsA(bspec, PartitionBoundSpec))
+ elog(ERROR, "expected PartitionBoundSpec");
+
+ if (!bspec->is_default)
+ {
+ List *part_qual;
+
+ part_qual = get_qual_for_range(parent, bspec, true);
- /* What's contained in this range datum? */
- bound->kind[i] = datum->kind;
+ /*
+ * AND the constraints of the partition and add to
+ * or_expr_args
+ */
+ or_expr_args = lappend(or_expr_args, list_length(part_qual) > 1
+ ? makeBoolExpr(AND_EXPR, part_qual, -1)
+ : linitial(part_qual));
+ }
+ ReleaseSysCache(tuple);
+ }
- if (datum->kind == PARTITION_RANGE_DATUM_VALUE)
+ if (or_expr_args != NIL)
{
- Const *val = castNode(Const, datum->value);
+ Expr *other_parts_constr;
- if (val->constisnull)
- elog(ERROR, "invalid range bound datum");
- bound->datums[i] = val->constvalue;
+ /*
+ * Combine the constraints obtained for non-default partitions
+ * using OR. As requested, each of the OR's args doesn't include
+ * the NOT NULL test for partition keys (which is to avoid its
+ * useless repetition). Add the same now.
+ */
+ other_parts_constr =
+ makeBoolExpr(AND_EXPR,
+ lappend(get_range_nulltest(key),
+ list_length(or_expr_args) > 1
+ ? makeBoolExpr(OR_EXPR, or_expr_args,
+ -1)
+ : linitial(or_expr_args)),
+ -1);
+
+ /*
+ * Finally, the default partition contains everything *NOT*
+ * contained in the non-default partitions.
+ */
+ result = list_make1(makeBoolExpr(NOT_EXPR,
+ list_make1(other_parts_constr), -1));
}
- i++;
+ return result;
}
- return bound;
-}
+ lower_or_start_datum = list_head(spec->lowerdatums);
+ upper_or_start_datum = list_head(spec->upperdatums);
+ num_or_arms = key->partnatts;
-/* Used when sorting range bounds across all range partitions */
-static int32
-qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
-{
- PartitionRangeBound *b1 = (*(PartitionRangeBound *const *) a);
- PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
- PartitionKey key = (PartitionKey) arg;
+ /*
+ * If it is the recursive call for default, we skip the get_range_nulltest
+ * to avoid accumulating the NullTest on the same keys for each partition.
+ */
+ if (!for_default)
+ result = get_range_nulltest(key);
- return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
-}
+ /*
+ * Iterate over the key columns and check if the corresponding lower and
+ * upper datums are equal using the btree equality operator for the
+ * column's type. If equal, we emit single keyCol = common_value
+ * expression. Starting from the first column for which the corresponding
+ * lower and upper bound datums are not equal, we generate OR expressions
+ * as shown in the function's header comment.
+ */
+ i = 0;
+ partexprs_item = list_head(key->partexprs);
+ partexprs_item_saved = partexprs_item; /* placate compiler */
+ forboth(cell1, spec->lowerdatums, cell2, spec->upperdatums)
+ {
+ EState *estate;
+ MemoryContext oldcxt;
+ Expr *test_expr;
+ ExprState *test_exprstate;
+ Datum test_result;
+ bool isNull;
-/*
- * partition_rbound_cmp
- *
- * Return for two range bounds whether the 1st one (specified in datums1,
- * kind1, and lower1) is <, =, or > the bound specified in *b2.
- *
- * Note that if the values of the two range bounds compare equal, then we take
- * into account whether they are upper or lower bounds, and an upper bound is
- * considered to be smaller than a lower bound. This is important to the way
- * that RelationBuildPartitionDesc() builds the PartitionBoundInfoData
- * structure, which only stores the upper bound of a common boundary between
- * two contiguous partitions.
- */
-static int32
-partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2)
-{
- int32 cmpval = 0; /* placate compiler */
- int i;
- Datum *datums2 = b2->datums;
- PartitionRangeDatumKind *kind2 = b2->kind;
- bool lower2 = b2->lower;
+ ldatum = castNode(PartitionRangeDatum, lfirst(cell1));
+ udatum = castNode(PartitionRangeDatum, lfirst(cell2));
+
+ /*
+ * Since get_range_key_properties() modifies partexprs_item, and we
+ * might need to start over from the previous expression in the later
+ * part of this function, save away the current value.
+ */
+ partexprs_item_saved = partexprs_item;
+
+ get_range_key_properties(key, i, ldatum, udatum,
+ &partexprs_item,
+ &keyCol,
+ &lower_val, &upper_val);
+
+ /*
+ * If either value is NULL, the corresponding partition bound is
+ * either MINVALUE or MAXVALUE, and we treat them as unequal, because
+ * even if they're the same, there is no common value to equate the
+ * key column with.
+ */
+ if (!lower_val || !upper_val)
+ break;
+
+ /* Create the test expression */
+ estate = CreateExecutorState();
+ oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+ test_expr = make_partition_op_expr(key, i, BTEqualStrategyNumber,
+ (Expr *) lower_val,
+ (Expr *) upper_val);
+ fix_opfuncids((Node *) test_expr);
+ test_exprstate = ExecInitExpr(test_expr, NULL);
+ test_result = ExecEvalExprSwitchContext(test_exprstate,
+ GetPerTupleExprContext(estate),
+ &isNull);
+ MemoryContextSwitchTo(oldcxt);
+ FreeExecutorState(estate);
+
+ /* If not equal, go generate the OR expressions */
+ if (!DatumGetBool(test_result))
+ break;
- for (i = 0; i < key->partnatts; i++)
- {
/*
- * First, handle cases where the column is unbounded, which should not
- * invoke the comparison procedure, and should not consider any later
- * columns. Note that the PartitionRangeDatumKind enum elements
- * compare the same way as the values they represent.
+ * The bounds for the last key column can't be equal, because such a
+ * range partition would never be allowed to be defined (it would have
+ * an empty range otherwise).
*/
- if (kind1[i] < kind2[i])
- return -1;
- else if (kind1[i] > kind2[i])
- return 1;
- else if (kind1[i] != PARTITION_RANGE_DATUM_VALUE)
+ if (i == key->partnatts - 1)
+ elog(ERROR, "invalid range bound specification");
- /*
- * The column bounds are both MINVALUE or both MAXVALUE. No later
- * columns should be considered, but we still need to compare
- * whether they are upper or lower bounds.
- */
- break;
+ /* Equal, so generate keyCol = lower_val expression */
+ result = lappend(result,
+ make_partition_op_expr(key, i, BTEqualStrategyNumber,
+ keyCol, (Expr *) lower_val));
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
- datums1[i],
- datums2[i]));
- if (cmpval != 0)
- break;
+ i++;
}
- /*
- * If the comparison is anything other than equal, we're done. If they
- * compare equal though, we still have to consider whether the boundaries
- * are inclusive or exclusive. Exclusive one is considered smaller of the
- * two.
- */
- if (cmpval == 0 && lower1 != lower2)
- cmpval = lower1 ? 1 : -1;
-
- return cmpval;
-}
-
-/*
- * partition_rbound_datum_cmp
- *
- * Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
- * is <, =, or > partition key of tuple (tuple_datums)
- */
-static int32
-partition_rbound_datum_cmp(PartitionKey key,
- Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums, int n_tuple_datums)
-{
- int i;
- int32 cmpval = -1;
+ /* First pair of lower_val and upper_val that are not equal. */
+ lower_or_start_datum = cell1;
+ upper_or_start_datum = cell2;
- for (i = 0; i < n_tuple_datums; i++)
+ /* OR will have as many arms as there are key columns left. */
+ num_or_arms = key->partnatts - i;
+ current_or_arm = 0;
+ lower_or_arms = upper_or_arms = NIL;
+ need_next_lower_arm = need_next_upper_arm = true;
+ while (current_or_arm < num_or_arms)
{
- if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
- return -1;
- else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
- return 1;
-
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
- rb_datums[i],
- tuple_datums[i]));
- if (cmpval != 0)
- break;
- }
-
- return cmpval;
-}
+ List *lower_or_arm_args = NIL,
+ *upper_or_arm_args = NIL;
-/*
- * partition_list_bsearch
- * Returns the index of the greatest bound datum that is less than equal
- * to the given value or -1 if all of the bound datums are greater
- *
- * *is_equal is set to true if the bound datum at the returned index is equal
- * to the input value.
- */
-static int
-partition_list_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
- Datum value, bool *is_equal)
-{
- int lo,
- hi,
- mid;
+ /* Restart scan of columns from the i'th one */
+ j = i;
+ partexprs_item = partexprs_item_saved;
- lo = -1;
- hi = boundinfo->ndatums - 1;
- while (lo < hi)
- {
- int32 cmpval;
-
- mid = (lo + hi + 1) / 2;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
- boundinfo->datums[mid][0],
- value));
- if (cmpval <= 0)
+ for_both_cell(cell1, lower_or_start_datum, cell2, upper_or_start_datum)
{
- lo = mid;
- *is_equal = (cmpval == 0);
- if (*is_equal)
- break;
- }
- else
- hi = mid - 1;
- }
+ PartitionRangeDatum *ldatum_next = NULL,
+ *udatum_next = NULL;
- return lo;
-}
+ ldatum = castNode(PartitionRangeDatum, lfirst(cell1));
+ if (lnext(cell1))
+ ldatum_next = castNode(PartitionRangeDatum,
+ lfirst(lnext(cell1)));
+ udatum = castNode(PartitionRangeDatum, lfirst(cell2));
+ if (lnext(cell2))
+ udatum_next = castNode(PartitionRangeDatum,
+ lfirst(lnext(cell2)));
+ get_range_key_properties(key, j, ldatum, udatum,
+ &partexprs_item,
+ &keyCol,
+ &lower_val, &upper_val);
-/*
- * partition_range_bsearch
- * Returns the index of the greatest range bound that is less than or
- * equal to the given range bound or -1 if all of the range bounds are
- * greater
- *
- * *is_equal is set to true if the range bound at the returned index is equal
- * to the input range bound
- */
-static int
-partition_range_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
- PartitionRangeBound *probe, bool *is_equal)
-{
- int lo,
- hi,
- mid;
+ if (need_next_lower_arm && lower_val)
+ {
+ uint16 strategy;
- lo = -1;
- hi = boundinfo->ndatums - 1;
- while (lo < hi)
- {
- int32 cmpval;
-
- mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key,
- boundinfo->datums[mid],
- boundinfo->kind[mid],
- (boundinfo->indexes[mid] == -1),
- probe);
- if (cmpval <= 0)
- {
- lo = mid;
- *is_equal = (cmpval == 0);
+ /*
+ * For the non-last columns of this arm, use the EQ operator.
+ * For the last column of this arm, use GT, unless this is the
+ * last column of the whole bound check, or the next bound
+ * datum is MINVALUE, in which case use GE.
+ */
+ if (j - i < current_or_arm)
+ strategy = BTEqualStrategyNumber;
+ else if (j == key->partnatts - 1 ||
+ (ldatum_next &&
+ ldatum_next->kind == PARTITION_RANGE_DATUM_MINVALUE))
+ strategy = BTGreaterEqualStrategyNumber;
+ else
+ strategy = BTGreaterStrategyNumber;
- if (*is_equal)
- break;
- }
- else
- hi = mid - 1;
- }
+ lower_or_arm_args = lappend(lower_or_arm_args,
+ make_partition_op_expr(key, j,
+ strategy,
+ keyCol,
+ (Expr *) lower_val));
+ }
- return lo;
-}
+ if (need_next_upper_arm && upper_val)
+ {
+ uint16 strategy;
-/*
- * partition_range_bsearch
- * Returns the index of the greatest range bound that is less than or
- * equal to the given tuple or -1 if all of the range bounds are greater
- *
- * *is_equal is set to true if the range bound at the returned index is equal
- * to the input tuple.
- */
-static int
-partition_range_datum_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
- int nvalues, Datum *values, bool *is_equal)
-{
- int lo,
- hi,
- mid;
+ /*
+ * For the non-last columns of this arm, use the EQ operator.
+ * For the last column of this arm, use LT, unless the next
+ * bound datum is MAXVALUE, in which case use LE.
+ */
+ if (j - i < current_or_arm)
+ strategy = BTEqualStrategyNumber;
+ else if (udatum_next &&
+ udatum_next->kind == PARTITION_RANGE_DATUM_MAXVALUE)
+ strategy = BTLessEqualStrategyNumber;
+ else
+ strategy = BTLessStrategyNumber;
- lo = -1;
- hi = boundinfo->ndatums - 1;
- while (lo < hi)
- {
- int32 cmpval;
-
- mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key,
- boundinfo->datums[mid],
- boundinfo->kind[mid],
- values,
- nvalues);
- if (cmpval <= 0)
- {
- lo = mid;
- *is_equal = (cmpval == 0);
+ upper_or_arm_args = lappend(upper_or_arm_args,
+ make_partition_op_expr(key, j,
+ strategy,
+ keyCol,
+ (Expr *) upper_val));
+ }
- if (*is_equal)
+ /*
+ * Did we generate enough of OR's arguments? First arm considers
+ * the first of the remaining columns, second arm considers first
+ * two of the remaining columns, and so on.
+ */
+ ++j;
+ if (j - i > current_or_arm)
+ {
+ /*
+ * We must not emit any more arms if the new column that will
+ * be considered is unbounded, or this one was.
+ */
+ if (!lower_val || !ldatum_next ||
+ ldatum_next->kind != PARTITION_RANGE_DATUM_VALUE)
+ need_next_lower_arm = false;
+ if (!upper_val || !udatum_next ||
+ udatum_next->kind != PARTITION_RANGE_DATUM_VALUE)
+ need_next_upper_arm = false;
break;
+ }
}
- else
- hi = mid - 1;
- }
- return lo;
-}
+ if (lower_or_arm_args != NIL)
+ lower_or_arms = lappend(lower_or_arms,
+ list_length(lower_or_arm_args) > 1
+ ? makeBoolExpr(AND_EXPR, lower_or_arm_args, -1)
+ : linitial(lower_or_arm_args));
-/*
- * partition_hash_bsearch
- * Returns the index of the greatest (modulus, remainder) pair that is
- * less than or equal to the given (modulus, remainder) pair or -1 if
- * all of them are greater
- */
-static int
-partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
- int modulus, int remainder)
-{
- int lo,
- hi,
- mid;
+ if (upper_or_arm_args != NIL)
+ upper_or_arms = lappend(upper_or_arms,
+ list_length(upper_or_arm_args) > 1
+ ? makeBoolExpr(AND_EXPR, upper_or_arm_args, -1)
+ : linitial(upper_or_arm_args));
- lo = -1;
- hi = boundinfo->ndatums - 1;
- while (lo < hi)
- {
- int32 cmpval,
- bound_modulus,
- bound_remainder;
-
- mid = (lo + hi + 1) / 2;
- bound_modulus = DatumGetInt32(boundinfo->datums[mid][0]);
- bound_remainder = DatumGetInt32(boundinfo->datums[mid][1]);
- cmpval = partition_hbound_cmp(bound_modulus, bound_remainder,
- modulus, remainder);
- if (cmpval <= 0)
- {
- lo = mid;
+ /* If no work to do in the next iteration, break away. */
+ if (!need_next_lower_arm && !need_next_upper_arm)
+ break;
- if (cmpval == 0)
- break;
- }
- else
- hi = mid - 1;
+ ++current_or_arm;
}
- return lo;
-}
+ /*
+ * Generate the OR expressions for each of lower and upper bounds (if
+ * required), and append to the list of implicitly ANDed list of
+ * expressions.
+ */
+ if (lower_or_arms != NIL)
+ result = lappend(result,
+ list_length(lower_or_arms) > 1
+ ? makeBoolExpr(OR_EXPR, lower_or_arms, -1)
+ : linitial(lower_or_arms));
+ if (upper_or_arms != NIL)
+ result = lappend(result,
+ list_length(upper_or_arms) > 1
+ ? makeBoolExpr(OR_EXPR, upper_or_arms, -1)
+ : linitial(upper_or_arms));
-/*
- * get_default_oid_from_partdesc
- *
- * Given a partition descriptor, return the OID of the default partition, if
- * one exists; else, return InvalidOid.
- */
-Oid
-get_default_oid_from_partdesc(PartitionDesc partdesc)
-{
- if (partdesc && partdesc->boundinfo &&
- partition_bound_has_default(partdesc->boundinfo))
- return partdesc->oids[partdesc->boundinfo->default_index];
+ /*
+ * As noted above, for non-default, we return list with constant TRUE. If
+ * the result is NIL during the recursive call for default, it implies
+ * this is the only other partition which can hold every value of the key
+ * except NULL. Hence we return the NullTest result skipped earlier.
+ */
+ if (result == NIL)
+ result = for_default
+ ? get_range_nulltest(key)
+ : list_make1(makeBoolConst(true, false));
- return InvalidOid;
+ return result;
}
/*
@@ -3189,99 +1352,6 @@ get_proposed_default_constraint(List *new_part_constraints)
}
/*
- * get_partition_bound_num_indexes
- *
- * Returns the number of the entries in the partition bound indexes array.
- */
-static int
-get_partition_bound_num_indexes(PartitionBoundInfo bound)
-{
- int num_indexes;
-
- Assert(bound);
-
- switch (bound->strategy)
- {
- case PARTITION_STRATEGY_HASH:
-
- /*
- * The number of the entries in the indexes array is same as the
- * greatest modulus.
- */
- num_indexes = get_greatest_modulus(bound);
- break;
-
- case PARTITION_STRATEGY_LIST:
- num_indexes = bound->ndatums;
- break;
-
- case PARTITION_STRATEGY_RANGE:
- /* Range partitioned table has an extra index. */
- num_indexes = bound->ndatums + 1;
- break;
-
- default:
- elog(ERROR, "unexpected partition strategy: %d",
- (int) bound->strategy);
- }
-
- return num_indexes;
-}
-
-/*
- * get_greatest_modulus
- *
- * Returns the greatest modulus of the hash partition bound. The greatest
- * modulus will be at the end of the datums array because hash partitions are
- * arranged in the ascending order of their modulus and remainders.
- */
-static int
-get_greatest_modulus(PartitionBoundInfo bound)
-{
- Assert(bound && bound->strategy == PARTITION_STRATEGY_HASH);
- Assert(bound->datums && bound->ndatums > 0);
- Assert(DatumGetInt32(bound->datums[bound->ndatums - 1][0]) > 0);
-
- return DatumGetInt32(bound->datums[bound->ndatums - 1][0]);
-}
-
-/*
- * compute_hash_value
- *
- * Compute the hash value for given not null partition key values.
- */
-static uint64
-compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
-{
- int i;
- int nkeys = key->partnatts;
- uint64 rowHash = 0;
- Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
-
- for (i = 0; i < nkeys; i++)
- {
- if (!isnull[i])
- {
- Datum hash;
-
- Assert(OidIsValid(key->partsupfunc[i].fn_oid));
-
- /*
- * Compute hash for each datum value by calling respective
- * datatype-specific hash functions of each partition key
- * attribute.
- */
- hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
-
- /* Form a single 64-bit hash value */
- rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
- }
- }
-
- return rowHash;
-}
-
-/*
* satisfies_hash_partition
*
* This is an SQL-callable function for use in hash partition constraints.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5d3e923cca..b17abb5c7d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -42,7 +42,6 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
-#include "catalog/partition.h"
#include "catalog/pg_publication.h"
#include "commands/matview.h"
#include "commands/trigger.h"
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4048c3ebc6..cc77ba3701 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -20,6 +20,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "utils/lsyscache.h"
+#include "utils/rel.h"
#include "utils/rls.h"
#include "utils/ruleutils.h"
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..8c7caabbc7 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -33,7 +33,6 @@
#include "access/heapam.h"
#include "access/htup_details.h"
#include "access/sysattr.h"
-#include "catalog/partition.h"
#include "catalog/pg_inherits_fn.h"
#include "catalog/pg_type.h"
#include "miscadmin.h"
@@ -49,6 +48,7 @@
#include "parser/parse_coerce.h"
#include "parser/parsetree.h"
#include "utils/lsyscache.h"
+#include "utils/partcache.h"
#include "utils/rel.h"
#include "utils/selfuncs.h"
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 3bb468bdad..107301ebc7 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -24,7 +24,6 @@
#include "access/sysattr.h"
#include "catalog/dependency.h"
#include "catalog/indexing.h"
-#include "catalog/partition.h"
#include "catalog/pg_aggregate.h"
#include "catalog/pg_am.h"
#include "catalog/pg_authid.h"
diff --git a/src/backend/utils/cache/Makefile b/src/backend/utils/cache/Makefile
index a943f8ea4b..94511eaf54 100644
--- a/src/backend/utils/cache/Makefile
+++ b/src/backend/utils/cache/Makefile
@@ -12,8 +12,8 @@ subdir = src/backend/utils/cache
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = attoptcache.o catcache.o evtcache.o inval.o plancache.o relcache.o \
- relmapper.o relfilenodemap.o spccache.o syscache.o lsyscache.o \
- typcache.o ts_cache.o
+OBJS = attoptcache.o catcache.o evtcache.o inval.o plancache.o partcache.o \
+ relcache.o relmapper.o relfilenodemap.o spccache.o syscache.o \
+ lsyscache.o typcache.o ts_cache.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/cache/partcache.c b/src/backend/utils/cache/partcache.c
new file mode 100644
index 0000000000..5df180b015
--- /dev/null
+++ b/src/backend/utils/cache/partcache.c
@@ -0,0 +1,2114 @@
+/*-------------------------------------------------------------------------
+ *
+ * partcache.c
+ * Partitioning related cache data structures and manipulation functions
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/utils/cache/partcache.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/nbtree.h"
+#include "access/sysattr.h"
+#include "catalog/partition.h"
+#include "catalog/pg_inherits_fn.h"
+#include "catalog/pg_opclass.h"
+#include "catalog/pg_partitioned_table.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/var.h"
+#include "utils/builtins.h"
+#include "utils/datum.h"
+#include "utils/hashutils.h"
+#include "utils/memutils.h"
+#include "utils/partcache.h"
+#include "utils/rel.h"
+#include "utils/ruleutils.h"
+#include "utils/syscache.h"
+
+/*
+ * When qsort'ing partition bounds after reading from the catalog, each bound
+ * is represented with one of the following structs.
+ */
+
+/* One bound of a hash partition */
+typedef struct PartitionHashBound
+{
+ int modulus;
+ int remainder;
+ int index;
+} PartitionHashBound;
+
+/* One value coming from some (index'th) list partition */
+typedef struct PartitionListValue
+{
+ int index;
+ Datum value;
+} PartitionListValue;
+
+/* One bound of a range partition */
+typedef struct PartitionRangeBound
+{
+ int index;
+ Datum *datums; /* range bound datums */
+ PartitionRangeDatumKind *kind; /* the kind of each datum */
+ bool lower; /* this is the lower (vs upper) bound */
+} PartitionRangeBound;
+
+static List *generate_partition_qual(Relation rel);
+
+static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
+ int remainder2);
+static int32 qsort_partition_hbound_cmp(const void *a, const void *b);
+static int32 qsort_partition_list_value_cmp(const void *a, const void *b,
+ void *arg);
+static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
+ List *datums, bool lower);
+static int32 partition_rbound_cmp(PartitionKey key,
+ Datum *datums1, PartitionRangeDatumKind *kind1,
+ bool lower1, PartitionRangeBound *b2);
+static int32 qsort_partition_rbound_cmp(const void *a, const void *b,
+ void *arg);
+
+static int partition_list_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ Datum value, bool *is_equal);
+static int partition_range_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ PartitionRangeBound *probe, bool *is_equal);
+static int32 partition_rbound_datum_cmp(PartitionKey key,
+ Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
+ Datum *tuple_datums, int n_tuple_datums);
+static int partition_range_datum_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ int nvalues, Datum *values, bool *is_equal);
+static int partition_hash_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ int modulus, int remainder);
+
+static int get_partition_bound_num_indexes(PartitionBoundInfo b);
+
+/*
+ * RelationBuildPartitionKey
+ * Build and attach to relcache partition key data of relation
+ *
+ * Partitioning key data is a complex structure; to avoid complicated logic to
+ * free individual elements whenever the relcache entry is flushed, we give it
+ * its own memory context, child of CacheMemoryContext, which can easily be
+ * deleted on its own. To avoid leaking memory in that context in case of an
+ * error partway through this function, the context is initially created as a
+ * child of CurTransactionContext and only re-parented to CacheMemoryContext
+ * at the end, when no further errors are possible. Also, we don't make this
+ * context the current context except in very brief code sections, out of fear
+ * that some of our callees allocate memory on their own which would be leaked
+ * permanently.
+ */
+void
+RelationBuildPartitionKey(Relation relation)
+{
+ Form_pg_partitioned_table form;
+ HeapTuple tuple;
+ bool isnull;
+ int i;
+ PartitionKey key;
+ AttrNumber *attrs;
+ oidvector *opclass;
+ oidvector *collation;
+ ListCell *partexprs_item;
+ Datum datum;
+ MemoryContext partkeycxt,
+ oldcxt;
+ int16 procnum;
+
+ tuple = SearchSysCache1(PARTRELID,
+ ObjectIdGetDatum(RelationGetRelid(relation)));
+
+ /*
+ * The following happens when we have created our pg_class entry but not
+ * the pg_partitioned_table entry yet.
+ */
+ if (!HeapTupleIsValid(tuple))
+ return;
+
+ partkeycxt = AllocSetContextCreateExtended(CurTransactionContext,
+ RelationGetRelationName(relation),
+ MEMCONTEXT_COPY_NAME,
+ ALLOCSET_SMALL_SIZES);
+
+ key = (PartitionKey) MemoryContextAllocZero(partkeycxt,
+ sizeof(PartitionKeyData));
+
+ /* Fixed-length attributes */
+ form = (Form_pg_partitioned_table) GETSTRUCT(tuple);
+ key->strategy = form->partstrat;
+ key->partnatts = form->partnatts;
+
+ /*
+ * We can rely on the first variable-length attribute being mapped to the
+ * relevant field of the catalog's C struct, because all previous
+ * attributes are non-nullable and fixed-length.
+ */
+ attrs = form->partattrs.values;
+
+ /* But use the hard way to retrieve further variable-length attributes */
+ /* Operator class */
+ datum = SysCacheGetAttr(PARTRELID, tuple,
+ Anum_pg_partitioned_table_partclass, &isnull);
+ Assert(!isnull);
+ opclass = (oidvector *) DatumGetPointer(datum);
+
+ /* Collation */
+ datum = SysCacheGetAttr(PARTRELID, tuple,
+ Anum_pg_partitioned_table_partcollation, &isnull);
+ Assert(!isnull);
+ collation = (oidvector *) DatumGetPointer(datum);
+
+ /* Expressions */
+ datum = SysCacheGetAttr(PARTRELID, tuple,
+ Anum_pg_partitioned_table_partexprs, &isnull);
+ if (!isnull)
+ {
+ char *exprString;
+ Node *expr;
+
+ exprString = TextDatumGetCString(datum);
+ expr = stringToNode(exprString);
+ pfree(exprString);
+
+ /*
+ * Run the expressions through const-simplification since the planner
+ * will be comparing them to similarly-processed qual clause operands,
+ * and may fail to detect valid matches without this step; fix
+ * opfuncids while at it. We don't need to bother with
+ * canonicalize_qual() though, because partition expressions are not
+ * full-fledged qualification clauses.
+ */
+ expr = eval_const_expressions(NULL, expr);
+ fix_opfuncids(expr);
+
+ oldcxt = MemoryContextSwitchTo(partkeycxt);
+ key->partexprs = (List *) copyObject(expr);
+ MemoryContextSwitchTo(oldcxt);
+ }
+
+ oldcxt = MemoryContextSwitchTo(partkeycxt);
+ key->partattrs = (AttrNumber *) palloc0(key->partnatts * sizeof(AttrNumber));
+ key->partopfamily = (Oid *) palloc0(key->partnatts * sizeof(Oid));
+ key->partopcintype = (Oid *) palloc0(key->partnatts * sizeof(Oid));
+ key->partsupfunc = (FmgrInfo *) palloc0(key->partnatts * sizeof(FmgrInfo));
+
+ key->partcollation = (Oid *) palloc0(key->partnatts * sizeof(Oid));
+
+ /* Gather type and collation info as well */
+ key->parttypid = (Oid *) palloc0(key->partnatts * sizeof(Oid));
+ key->parttypmod = (int32 *) palloc0(key->partnatts * sizeof(int32));
+ key->parttyplen = (int16 *) palloc0(key->partnatts * sizeof(int16));
+ key->parttypbyval = (bool *) palloc0(key->partnatts * sizeof(bool));
+ key->parttypalign = (char *) palloc0(key->partnatts * sizeof(char));
+ key->parttypcoll = (Oid *) palloc0(key->partnatts * sizeof(Oid));
+ MemoryContextSwitchTo(oldcxt);
+
+ /* determine support function number to search for */
+ procnum = (key->strategy == PARTITION_STRATEGY_HASH) ?
+ HASHEXTENDED_PROC : BTORDER_PROC;
+
+ /* Copy partattrs and fill other per-attribute info */
+ memcpy(key->partattrs, attrs, key->partnatts * sizeof(int16));
+ partexprs_item = list_head(key->partexprs);
+ for (i = 0; i < key->partnatts; i++)
+ {
+ AttrNumber attno = key->partattrs[i];
+ HeapTuple opclasstup;
+ Form_pg_opclass opclassform;
+ Oid funcid;
+
+ /* Collect opfamily information */
+ opclasstup = SearchSysCache1(CLAOID,
+ ObjectIdGetDatum(opclass->values[i]));
+ if (!HeapTupleIsValid(opclasstup))
+ elog(ERROR, "cache lookup failed for opclass %u", opclass->values[i]);
+
+ opclassform = (Form_pg_opclass) GETSTRUCT(opclasstup);
+ key->partopfamily[i] = opclassform->opcfamily;
+ key->partopcintype[i] = opclassform->opcintype;
+
+ /* Get a support function for the specified opfamily and datatypes */
+ funcid = get_opfamily_proc(opclassform->opcfamily,
+ opclassform->opcintype,
+ opclassform->opcintype,
+ procnum);
+ if (!OidIsValid(funcid))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("operator class \"%s\" of access method %s is missing support function %d for type %s",
+ NameStr(opclassform->opcname),
+ (key->strategy == PARTITION_STRATEGY_HASH) ?
+ "hash" : "btree",
+ procnum,
+ format_type_be(opclassform->opcintype))));
+
+ fmgr_info(funcid, &key->partsupfunc[i]);
+
+ /* Collation */
+ key->partcollation[i] = collation->values[i];
+
+ /* Collect type information */
+ if (attno != 0)
+ {
+ Form_pg_attribute att = TupleDescAttr(relation->rd_att, attno - 1);
+
+ key->parttypid[i] = att->atttypid;
+ key->parttypmod[i] = att->atttypmod;
+ key->parttypcoll[i] = att->attcollation;
+ }
+ else
+ {
+ if (partexprs_item == NULL)
+ elog(ERROR, "wrong number of partition key expressions");
+
+ key->parttypid[i] = exprType(lfirst(partexprs_item));
+ key->parttypmod[i] = exprTypmod(lfirst(partexprs_item));
+ key->parttypcoll[i] = exprCollation(lfirst(partexprs_item));
+
+ partexprs_item = lnext(partexprs_item);
+ }
+ get_typlenbyvalalign(key->parttypid[i],
+ &key->parttyplen[i],
+ &key->parttypbyval[i],
+ &key->parttypalign[i]);
+
+ ReleaseSysCache(opclasstup);
+ }
+
+ ReleaseSysCache(tuple);
+
+ /*
+ * Success --- reparent our context and make the relcache point to the
+ * newly constructed key
+ */
+ MemoryContextSetParent(partkeycxt, CacheMemoryContext);
+ relation->rd_partkeycxt = partkeycxt;
+ relation->rd_partkey = key;
+}
+
+/*
+ * RelationBuildPartitionDesc
+ * Form rel's partition descriptor
+ *
+ * Not flushed from the cache by RelationClearRelation() unless changed because
+ * of addition or removal of partition.
+ */
+void
+RelationBuildPartitionDesc(Relation rel)
+{
+ List *inhoids,
+ *partoids;
+ Oid *oids = NULL;
+ List *boundspecs = NIL;
+ ListCell *cell;
+ int i,
+ nparts;
+ PartitionKey key = RelationGetPartitionKey(rel);
+ PartitionDesc result;
+ MemoryContext oldcxt;
+
+ int ndatums = 0;
+ int default_index = -1;
+
+ /* Hash partitioning specific */
+ PartitionHashBound **hbounds = NULL;
+
+ /* List partitioning specific */
+ PartitionListValue **all_values = NULL;
+ int null_index = -1;
+
+ /* Range partitioning specific */
+ PartitionRangeBound **rbounds = NULL;
+
+ /*
+ * The following could happen in situations where rel has a pg_class entry
+ * but not the pg_partitioned_table entry yet.
+ */
+ if (key == NULL)
+ return;
+
+ /* Get partition oids from pg_inherits */
+ inhoids = find_inheritance_children(RelationGetRelid(rel), NoLock);
+
+ /* Collect bound spec nodes in a list */
+ i = 0;
+ partoids = NIL;
+ foreach(cell, inhoids)
+ {
+ Oid inhrelid = lfirst_oid(cell);
+ HeapTuple tuple;
+ Datum datum;
+ bool isnull;
+ Node *boundspec;
+
+ tuple = SearchSysCache1(RELOID, inhrelid);
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for relation %u", inhrelid);
+
+ /*
+ * It is possible that the pg_class tuple of a partition has not been
+ * updated yet to set its relpartbound field. The only case where
+ * this happens is when we open the parent relation to check using its
+ * partition descriptor that a new partition's bound does not overlap
+ * some existing partition.
+ */
+ if (!((Form_pg_class) GETSTRUCT(tuple))->relispartition)
+ {
+ ReleaseSysCache(tuple);
+ continue;
+ }
+
+ datum = SysCacheGetAttr(RELOID, tuple,
+ Anum_pg_class_relpartbound,
+ &isnull);
+ Assert(!isnull);
+ boundspec = (Node *) stringToNode(TextDatumGetCString(datum));
+
+ /*
+ * Sanity check: If the PartitionBoundSpec says this is the default
+ * partition, its OID should correspond to whatever's stored in
+ * pg_partitioned_table.partdefid; if not, the catalog is corrupt.
+ */
+ if (castNode(PartitionBoundSpec, boundspec)->is_default)
+ {
+ Oid partdefid;
+
+ partdefid = get_default_partition_oid(RelationGetRelid(rel));
+ if (partdefid != inhrelid)
+ elog(ERROR, "expected partdefid %u, but got %u",
+ inhrelid, partdefid);
+ }
+
+ boundspecs = lappend(boundspecs, boundspec);
+ partoids = lappend_oid(partoids, inhrelid);
+ ReleaseSysCache(tuple);
+ }
+
+ nparts = list_length(partoids);
+
+ if (nparts > 0)
+ {
+ oids = (Oid *) palloc(nparts * sizeof(Oid));
+ i = 0;
+ foreach(cell, partoids)
+ oids[i++] = lfirst_oid(cell);
+
+ /* Convert from node to the internal representation */
+ if (key->strategy == PARTITION_STRATEGY_HASH)
+ {
+ ndatums = nparts;
+ hbounds = (PartitionHashBound **)
+ palloc(nparts * sizeof(PartitionHashBound *));
+
+ i = 0;
+ foreach(cell, boundspecs)
+ {
+ PartitionBoundSpec *spec = castNode(PartitionBoundSpec,
+ lfirst(cell));
+
+ if (spec->strategy != PARTITION_STRATEGY_HASH)
+ elog(ERROR, "invalid strategy in partition bound spec");
+
+ hbounds[i] = (PartitionHashBound *)
+ palloc(sizeof(PartitionHashBound));
+
+ hbounds[i]->modulus = spec->modulus;
+ hbounds[i]->remainder = spec->remainder;
+ hbounds[i]->index = i;
+ i++;
+ }
+
+ /* Sort all the bounds in ascending order */
+ qsort(hbounds, nparts, sizeof(PartitionHashBound *),
+ qsort_partition_hbound_cmp);
+ }
+ else if (key->strategy == PARTITION_STRATEGY_LIST)
+ {
+ List *non_null_values = NIL;
+
+ /*
+ * Create a unified list of non-null values across all partitions.
+ */
+ i = 0;
+ null_index = -1;
+ foreach(cell, boundspecs)
+ {
+ PartitionBoundSpec *spec = castNode(PartitionBoundSpec,
+ lfirst(cell));
+ ListCell *c;
+
+ if (spec->strategy != PARTITION_STRATEGY_LIST)
+ elog(ERROR, "invalid strategy in partition bound spec");
+
+ /*
+ * Note the index of the partition bound spec for the default
+ * partition. There's no datum to add to the list of non-null
+ * datums for this partition.
+ */
+ if (spec->is_default)
+ {
+ default_index = i;
+ i++;
+ continue;
+ }
+
+ foreach(c, spec->listdatums)
+ {
+ Const *val = castNode(Const, lfirst(c));
+ PartitionListValue *list_value = NULL;
+
+ if (!val->constisnull)
+ {
+ list_value = (PartitionListValue *)
+ palloc0(sizeof(PartitionListValue));
+ list_value->index = i;
+ list_value->value = val->constvalue;
+ }
+ else
+ {
+ /*
+ * Never put a null into the values array, flag
+ * instead for the code further down below where we
+ * construct the actual relcache struct.
+ */
+ if (null_index != -1)
+ elog(ERROR, "found null more than once");
+ null_index = i;
+ }
+
+ if (list_value)
+ non_null_values = lappend(non_null_values,
+ list_value);
+ }
+
+ i++;
+ }
+
+ ndatums = list_length(non_null_values);
+
+ /*
+ * Collect all list values in one array. Alongside the value, we
+ * also save the index of partition the value comes from.
+ */
+ all_values = (PartitionListValue **) palloc(ndatums *
+ sizeof(PartitionListValue *));
+ i = 0;
+ foreach(cell, non_null_values)
+ {
+ PartitionListValue *src = lfirst(cell);
+
+ all_values[i] = (PartitionListValue *)
+ palloc(sizeof(PartitionListValue));
+ all_values[i]->value = src->value;
+ all_values[i]->index = src->index;
+ i++;
+ }
+
+ qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
+ qsort_partition_list_value_cmp, (void *) key);
+ }
+ else if (key->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ int k;
+ PartitionRangeBound **all_bounds,
+ *prev;
+
+ all_bounds = (PartitionRangeBound **) palloc0(2 * nparts *
+ sizeof(PartitionRangeBound *));
+
+ /*
+ * Create a unified list of range bounds across all the
+ * partitions.
+ */
+ i = ndatums = 0;
+ foreach(cell, boundspecs)
+ {
+ PartitionBoundSpec *spec = castNode(PartitionBoundSpec,
+ lfirst(cell));
+ PartitionRangeBound *lower,
+ *upper;
+
+ if (spec->strategy != PARTITION_STRATEGY_RANGE)
+ elog(ERROR, "invalid strategy in partition bound spec");
+
+ /*
+ * Note the index of the partition bound spec for the default
+ * partition. There's no datum to add to the allbounds array
+ * for this partition.
+ */
+ if (spec->is_default)
+ {
+ default_index = i++;
+ continue;
+ }
+
+ lower = make_one_range_bound(key, i, spec->lowerdatums,
+ true);
+ upper = make_one_range_bound(key, i, spec->upperdatums,
+ false);
+ all_bounds[ndatums++] = lower;
+ all_bounds[ndatums++] = upper;
+ i++;
+ }
+
+ Assert(ndatums == nparts * 2 ||
+ (default_index != -1 && ndatums == (nparts - 1) * 2));
+
+ /* Sort all the bounds in ascending order */
+ qsort_arg(all_bounds, ndatums,
+ sizeof(PartitionRangeBound *),
+ qsort_partition_rbound_cmp,
+ (void *) key);
+
+ /* Save distinct bounds from all_bounds into rbounds. */
+ rbounds = (PartitionRangeBound **)
+ palloc(ndatums * sizeof(PartitionRangeBound *));
+ k = 0;
+ prev = NULL;
+ for (i = 0; i < ndatums; i++)
+ {
+ PartitionRangeBound *cur = all_bounds[i];
+ bool is_distinct = false;
+ int j;
+
+ /* Is the current bound distinct from the previous one? */
+ for (j = 0; j < key->partnatts; j++)
+ {
+ Datum cmpval;
+
+ if (prev == NULL || cur->kind[j] != prev->kind[j])
+ {
+ is_distinct = true;
+ break;
+ }
+
+ /*
+ * If the bounds are both MINVALUE or MAXVALUE, stop now
+ * and treat them as equal, since any values after this
+ * point must be ignored.
+ */
+ if (cur->kind[j] != PARTITION_RANGE_DATUM_VALUE)
+ break;
+
+ cmpval = FunctionCall2Coll(&key->partsupfunc[j],
+ key->partcollation[j],
+ cur->datums[j],
+ prev->datums[j]);
+ if (DatumGetInt32(cmpval) != 0)
+ {
+ is_distinct = true;
+ break;
+ }
+ }
+
+ /*
+ * Only if the bound is distinct save it into a temporary
+ * array i.e. rbounds which is later copied into boundinfo
+ * datums array.
+ */
+ if (is_distinct)
+ rbounds[k++] = all_bounds[i];
+
+ prev = cur;
+ }
+
+ /* Update ndatums to hold the count of distinct datums. */
+ ndatums = k;
+ }
+ else
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ /* Now build the actual relcache partition descriptor */
+ rel->rd_pdcxt = AllocSetContextCreateExtended(CacheMemoryContext,
+ RelationGetRelationName(rel),
+ MEMCONTEXT_COPY_NAME,
+ ALLOCSET_DEFAULT_SIZES);
+ oldcxt = MemoryContextSwitchTo(rel->rd_pdcxt);
+
+ result = (PartitionDescData *) palloc0(sizeof(PartitionDescData));
+ result->nparts = nparts;
+ if (nparts > 0)
+ {
+ PartitionBoundInfo boundinfo;
+ int *mapping;
+ int next_index = 0;
+
+ result->oids = (Oid *) palloc0(nparts * sizeof(Oid));
+
+ boundinfo = (PartitionBoundInfoData *)
+ palloc0(sizeof(PartitionBoundInfoData));
+ boundinfo->strategy = key->strategy;
+ boundinfo->default_index = -1;
+ boundinfo->ndatums = ndatums;
+ boundinfo->null_index = -1;
+ boundinfo->datums = (Datum **) palloc0(ndatums * sizeof(Datum *));
+
+ /* Initialize mapping array with invalid values */
+ mapping = (int *) palloc(sizeof(int) * nparts);
+ for (i = 0; i < nparts; i++)
+ mapping[i] = -1;
+
+ switch (key->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ {
+ /* Modulus are stored in ascending order */
+ int greatest_modulus = hbounds[ndatums - 1]->modulus;
+
+ boundinfo->indexes = (int *) palloc(greatest_modulus *
+ sizeof(int));
+
+ for (i = 0; i < greatest_modulus; i++)
+ boundinfo->indexes[i] = -1;
+
+ for (i = 0; i < nparts; i++)
+ {
+ int modulus = hbounds[i]->modulus;
+ int remainder = hbounds[i]->remainder;
+
+ boundinfo->datums[i] = (Datum *) palloc(2 *
+ sizeof(Datum));
+ boundinfo->datums[i][0] = Int32GetDatum(modulus);
+ boundinfo->datums[i][1] = Int32GetDatum(remainder);
+
+ while (remainder < greatest_modulus)
+ {
+ /* overlap? */
+ Assert(boundinfo->indexes[remainder] == -1);
+ boundinfo->indexes[remainder] = i;
+ remainder += modulus;
+ }
+
+ mapping[hbounds[i]->index] = i;
+ pfree(hbounds[i]);
+ }
+ pfree(hbounds);
+ break;
+ }
+
+ case PARTITION_STRATEGY_LIST:
+ {
+ boundinfo->indexes = (int *) palloc(ndatums * sizeof(int));
+
+ /*
+ * Copy values. Indexes of individual values are mapped
+ * to canonical values so that they match for any two list
+ * partitioned tables with same number of partitions and
+ * same lists per partition. One way to canonicalize is
+ * to assign the index in all_values[] of the smallest
+ * value of each partition, as the index of all of the
+ * partition's values.
+ */
+ for (i = 0; i < ndatums; i++)
+ {
+ boundinfo->datums[i] = (Datum *) palloc(sizeof(Datum));
+ boundinfo->datums[i][0] = datumCopy(all_values[i]->value,
+ key->parttypbyval[0],
+ key->parttyplen[0]);
+
+ /* If the old index has no mapping, assign one */
+ if (mapping[all_values[i]->index] == -1)
+ mapping[all_values[i]->index] = next_index++;
+
+ boundinfo->indexes[i] = mapping[all_values[i]->index];
+ }
+
+ /*
+ * If null-accepting partition has no mapped index yet,
+ * assign one. This could happen if such partition
+ * accepts only null and hence not covered in the above
+ * loop which only handled non-null values.
+ */
+ if (null_index != -1)
+ {
+ Assert(null_index >= 0);
+ if (mapping[null_index] == -1)
+ mapping[null_index] = next_index++;
+ boundinfo->null_index = mapping[null_index];
+ }
+
+ /* Assign mapped index for the default partition. */
+ if (default_index != -1)
+ {
+ /*
+ * The default partition accepts any value not
+ * specified in the lists of other partitions, hence
+ * it should not get mapped index while assigning
+ * those for non-null datums.
+ */
+ Assert(default_index >= 0 &&
+ mapping[default_index] == -1);
+ mapping[default_index] = next_index++;
+ boundinfo->default_index = mapping[default_index];
+ }
+
+ /* All partition must now have a valid mapping */
+ Assert(next_index == nparts);
+ break;
+ }
+
+ case PARTITION_STRATEGY_RANGE:
+ {
+ boundinfo->kind = (PartitionRangeDatumKind **)
+ palloc(ndatums *
+ sizeof(PartitionRangeDatumKind *));
+ boundinfo->indexes = (int *) palloc((ndatums + 1) *
+ sizeof(int));
+
+ for (i = 0; i < ndatums; i++)
+ {
+ int j;
+
+ boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
+ sizeof(Datum));
+ boundinfo->kind[i] = (PartitionRangeDatumKind *)
+ palloc(key->partnatts *
+ sizeof(PartitionRangeDatumKind));
+ for (j = 0; j < key->partnatts; j++)
+ {
+ if (rbounds[i]->kind[j] == PARTITION_RANGE_DATUM_VALUE)
+ boundinfo->datums[i][j] =
+ datumCopy(rbounds[i]->datums[j],
+ key->parttypbyval[j],
+ key->parttyplen[j]);
+ boundinfo->kind[i][j] = rbounds[i]->kind[j];
+ }
+
+ /*
+ * There is no mapping for invalid indexes.
+ *
+ * Any lower bounds in the rbounds array have invalid
+ * indexes assigned, because the values between the
+ * previous bound (if there is one) and this (lower)
+ * bound are not part of the range of any existing
+ * partition.
+ */
+ if (rbounds[i]->lower)
+ boundinfo->indexes[i] = -1;
+ else
+ {
+ int orig_index = rbounds[i]->index;
+
+ /* If the old index has no mapping, assign one */
+ if (mapping[orig_index] == -1)
+ mapping[orig_index] = next_index++;
+
+ boundinfo->indexes[i] = mapping[orig_index];
+ }
+ }
+
+ /* Assign mapped index for the default partition. */
+ if (default_index != -1)
+ {
+ Assert(default_index >= 0 && mapping[default_index] == -1);
+ mapping[default_index] = next_index++;
+ boundinfo->default_index = mapping[default_index];
+ }
+ boundinfo->indexes[i] = -1;
+ break;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ result->boundinfo = boundinfo;
+
+ /*
+ * Now assign OIDs from the original array into mapped indexes of the
+ * result array. Order of OIDs in the former is defined by the
+ * catalog scan that retrieved them, whereas that in the latter is
+ * defined by canonicalized representation of the partition bounds.
+ */
+ for (i = 0; i < nparts; i++)
+ result->oids[mapping[i]] = oids[i];
+ pfree(mapping);
+ }
+
+ MemoryContextSwitchTo(oldcxt);
+ rel->rd_partdesc = result;
+}
+
+/*
+ * Are two partition bound collections logically equal?
+ *
+ * Used in the keep logic of relcache.c (ie, in RelationClearRelation()).
+ * This is also useful when b1 and b2 are bound collections of two separate
+ * relations, respectively, because PartitionBoundInfo is a canonical
+ * representation of partition bounds.
+ */
+bool
+partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
+ PartitionBoundInfo b1, PartitionBoundInfo b2)
+{
+ int i;
+
+ if (b1->strategy != b2->strategy)
+ return false;
+
+ if (b1->ndatums != b2->ndatums)
+ return false;
+
+ if (b1->null_index != b2->null_index)
+ return false;
+
+ if (b1->default_index != b2->default_index)
+ return false;
+
+ if (b1->strategy == PARTITION_STRATEGY_HASH)
+ {
+ int greatest_modulus = get_greatest_modulus(b1);
+
+ /*
+ * If two hash partitioned tables have different greatest moduli,
+ * their partition schemes don't match.
+ */
+ if (greatest_modulus != get_greatest_modulus(b2))
+ return false;
+
+ /*
+ * We arrange the partitions in the ascending order of their modulus
+ * and remainders. Also every modulus is factor of next larger
+ * modulus. Therefore we can safely store index of a given partition
+ * in indexes array at remainder of that partition. Also entries at
+ * (remainder + N * modulus) positions in indexes array are all same
+ * for (modulus, remainder) specification for any partition. Thus
+ * datums array from both the given bounds are same, if and only if
+ * their indexes array will be same. So, it suffices to compare
+ * indexes array.
+ */
+ for (i = 0; i < greatest_modulus; i++)
+ if (b1->indexes[i] != b2->indexes[i])
+ return false;
+
+#ifdef USE_ASSERT_CHECKING
+
+ /*
+ * Nonetheless make sure that the bounds are indeed same when the
+ * indexes match. Hash partition bound stores modulus and remainder
+ * at b1->datums[i][0] and b1->datums[i][1] position respectively.
+ */
+ for (i = 0; i < b1->ndatums; i++)
+ Assert((b1->datums[i][0] == b2->datums[i][0] &&
+ b1->datums[i][1] == b2->datums[i][1]));
+#endif
+ }
+ else
+ {
+ for (i = 0; i < b1->ndatums; i++)
+ {
+ int j;
+
+ for (j = 0; j < partnatts; j++)
+ {
+ /* For range partitions, the bounds might not be finite. */
+ if (b1->kind != NULL)
+ {
+ /* The different kinds of bound all differ from each other */
+ if (b1->kind[i][j] != b2->kind[i][j])
+ return false;
+
+ /*
+ * Non-finite bounds are equal without further
+ * examination.
+ */
+ if (b1->kind[i][j] != PARTITION_RANGE_DATUM_VALUE)
+ continue;
+ }
+
+ /*
+ * Compare the actual values. Note that it would be both
+ * incorrect and unsafe to invoke the comparison operator
+ * derived from the partitioning specification here. It would
+ * be incorrect because we want the relcache entry to be
+ * updated for ANY change to the partition bounds, not just
+ * those that the partitioning operator thinks are
+ * significant. It would be unsafe because we might reach
+ * this code in the context of an aborted transaction, and an
+ * arbitrary partitioning operator might not be safe in that
+ * context. datumIsEqual() should be simple enough to be
+ * safe.
+ */
+ if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
+ parttypbyval[j], parttyplen[j]))
+ return false;
+ }
+
+ if (b1->indexes[i] != b2->indexes[i])
+ return false;
+ }
+
+ /* There are ndatums+1 indexes in case of range partitions */
+ if (b1->strategy == PARTITION_STRATEGY_RANGE &&
+ b1->indexes[i] != b2->indexes[i])
+ return false;
+ }
+ return true;
+}
+
+/*
+ * Return a copy of given PartitionBoundInfo structure. The data types of bounds
+ * are described by given partition key specification.
+ */
+PartitionBoundInfo
+partition_bounds_copy(PartitionBoundInfo src,
+ PartitionKey key)
+{
+ PartitionBoundInfo dest;
+ int i;
+ int ndatums;
+ int partnatts;
+ int num_indexes;
+
+ dest = (PartitionBoundInfo) palloc(sizeof(PartitionBoundInfoData));
+
+ dest->strategy = src->strategy;
+ ndatums = dest->ndatums = src->ndatums;
+ partnatts = key->partnatts;
+
+ num_indexes = get_partition_bound_num_indexes(src);
+
+ /* List partitioned tables have only a single partition key. */
+ Assert(key->strategy != PARTITION_STRATEGY_LIST || partnatts == 1);
+
+ dest->datums = (Datum **) palloc(sizeof(Datum *) * ndatums);
+
+ if (src->kind != NULL)
+ {
+ dest->kind = (PartitionRangeDatumKind **) palloc(ndatums *
+ sizeof(PartitionRangeDatumKind *));
+ for (i = 0; i < ndatums; i++)
+ {
+ dest->kind[i] = (PartitionRangeDatumKind *) palloc(partnatts *
+ sizeof(PartitionRangeDatumKind));
+
+ memcpy(dest->kind[i], src->kind[i],
+ sizeof(PartitionRangeDatumKind) * key->partnatts);
+ }
+ }
+ else
+ dest->kind = NULL;
+
+ for (i = 0; i < ndatums; i++)
+ {
+ int j;
+
+ /*
+ * For a corresponding to hash partition, datums array will have two
+ * elements - modulus and remainder.
+ */
+ bool hash_part = (key->strategy == PARTITION_STRATEGY_HASH);
+ int natts = hash_part ? 2 : partnatts;
+
+ dest->datums[i] = (Datum *) palloc(sizeof(Datum) * natts);
+
+ for (j = 0; j < natts; j++)
+ {
+ bool byval;
+ int typlen;
+
+ if (hash_part)
+ {
+ typlen = sizeof(int32); /* Always int4 */
+ byval = true; /* int4 is pass-by-value */
+ }
+ else
+ {
+ byval = key->parttypbyval[j];
+ typlen = key->parttyplen[j];
+ }
+
+ if (dest->kind == NULL ||
+ dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE)
+ dest->datums[i][j] = datumCopy(src->datums[i][j],
+ byval, typlen);
+ }
+ }
+
+ dest->indexes = (int *) palloc(sizeof(int) * num_indexes);
+ memcpy(dest->indexes, src->indexes, sizeof(int) * num_indexes);
+
+ dest->null_index = src->null_index;
+ dest->default_index = src->default_index;
+
+ return dest;
+}
+
+/*
+ * check_new_partition_bound
+ *
+ * Checks if the new partition's bound overlaps any of the existing partitions
+ * of parent. Also performs additional checks as necessary per strategy.
+ */
+void
+check_new_partition_bound(char *relname, Relation parent,
+ PartitionBoundSpec *spec)
+{
+ PartitionKey key = RelationGetPartitionKey(parent);
+ PartitionDesc partdesc = RelationGetPartitionDesc(parent);
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ ParseState *pstate = make_parsestate(NULL);
+ int with = -1;
+ bool overlap = false;
+
+ if (spec->is_default)
+ {
+ if (boundinfo == NULL || !partition_bound_has_default(boundinfo))
+ return;
+
+ /* Default partition already exists, error out. */
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("partition \"%s\" conflicts with existing default partition \"%s\"",
+ relname, get_rel_name(partdesc->oids[boundinfo->default_index])),
+ parser_errposition(pstate, spec->location)));
+ }
+
+ switch (key->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ {
+ Assert(spec->strategy == PARTITION_STRATEGY_HASH);
+ Assert(spec->remainder >= 0 && spec->remainder < spec->modulus);
+
+ if (partdesc->nparts > 0)
+ {
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ Datum **datums = boundinfo->datums;
+ int ndatums = boundinfo->ndatums;
+ int greatest_modulus;
+ int remainder;
+ int offset;
+ bool valid_modulus = true;
+ int prev_modulus, /* Previous largest modulus */
+ next_modulus; /* Next largest modulus */
+
+ /*
+ * Check rule that every modulus must be a factor of the
+ * next larger modulus. For example, if you have a bunch
+ * of partitions that all have modulus 5, you can add a
+ * new partition with modulus 10 or a new partition with
+ * modulus 15, but you cannot add both a partition with
+ * modulus 10 and a partition with modulus 15, because 10
+ * is not a factor of 15.
+ *
+ * Get the greatest (modulus, remainder) pair contained in
+ * boundinfo->datums that is less than or equal to the
+ * (spec->modulus, spec->remainder) pair.
+ */
+ offset = partition_hash_bsearch(key, boundinfo,
+ spec->modulus,
+ spec->remainder);
+ if (offset < 0)
+ {
+ next_modulus = DatumGetInt32(datums[0][0]);
+ valid_modulus = (next_modulus % spec->modulus) == 0;
+ }
+ else
+ {
+ prev_modulus = DatumGetInt32(datums[offset][0]);
+ valid_modulus = (spec->modulus % prev_modulus) == 0;
+
+ if (valid_modulus && (offset + 1) < ndatums)
+ {
+ next_modulus = DatumGetInt32(datums[offset + 1][0]);
+ valid_modulus = (next_modulus % spec->modulus) == 0;
+ }
+ }
+
+ if (!valid_modulus)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("every hash partition modulus must be a factor of the next larger modulus")));
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ remainder = spec->remainder;
+
+ /*
+ * Normally, the lowest remainder that could conflict with
+ * the new partition is equal to the remainder specified
+ * for the new partition, but when the new partition has a
+ * modulus higher than any used so far, we need to adjust.
+ */
+ if (remainder >= greatest_modulus)
+ remainder = remainder % greatest_modulus;
+
+ /* Check every potentially-conflicting remainder. */
+ do
+ {
+ if (boundinfo->indexes[remainder] != -1)
+ {
+ overlap = true;
+ with = boundinfo->indexes[remainder];
+ break;
+ }
+ remainder += spec->modulus;
+ } while (remainder < greatest_modulus);
+ }
+
+ break;
+ }
+
+ case PARTITION_STRATEGY_LIST:
+ {
+ Assert(spec->strategy == PARTITION_STRATEGY_LIST);
+
+ if (partdesc->nparts > 0)
+ {
+ ListCell *cell;
+
+ Assert(boundinfo &&
+ boundinfo->strategy == PARTITION_STRATEGY_LIST &&
+ (boundinfo->ndatums > 0 ||
+ partition_bound_accepts_nulls(boundinfo) ||
+ partition_bound_has_default(boundinfo)));
+
+ foreach(cell, spec->listdatums)
+ {
+ Const *val = castNode(Const, lfirst(cell));
+
+ if (!val->constisnull)
+ {
+ int offset;
+ bool equal;
+
+ offset = partition_list_bsearch(key, boundinfo,
+ val->constvalue,
+ &equal);
+ if (offset >= 0 && equal)
+ {
+ overlap = true;
+ with = boundinfo->indexes[offset];
+ break;
+ }
+ }
+ else if (partition_bound_accepts_nulls(boundinfo))
+ {
+ overlap = true;
+ with = boundinfo->null_index;
+ break;
+ }
+ }
+ }
+
+ break;
+ }
+
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartitionRangeBound *lower,
+ *upper;
+
+ Assert(spec->strategy == PARTITION_STRATEGY_RANGE);
+ lower = make_one_range_bound(key, -1, spec->lowerdatums, true);
+ upper = make_one_range_bound(key, -1, spec->upperdatums, false);
+
+ /*
+ * First check if the resulting range would be empty with
+ * specified lower and upper bounds
+ */
+ if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
+ upper) >= 0)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("empty range bound specified for partition \"%s\"",
+ relname),
+ errdetail("Specified lower bound %s is greater than or equal to upper bound %s.",
+ get_range_partbound_string(spec->lowerdatums),
+ get_range_partbound_string(spec->upperdatums)),
+ parser_errposition(pstate, spec->location)));
+ }
+
+ if (partdesc->nparts > 0)
+ {
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ int offset;
+ bool equal;
+
+ Assert(boundinfo &&
+ boundinfo->strategy == PARTITION_STRATEGY_RANGE &&
+ (boundinfo->ndatums > 0 ||
+ partition_bound_has_default(boundinfo)));
+
+ /*
+ * Test whether the new lower bound (which is treated
+ * inclusively as part of the new partition) lies inside
+ * an existing partition, or in a gap.
+ *
+ * If it's inside an existing partition, the bound at
+ * offset + 1 will be the upper bound of that partition,
+ * and its index will be >= 0.
+ *
+ * If it's in a gap, the bound at offset + 1 will be the
+ * lower bound of the next partition, and its index will
+ * be -1. This is also true if there is no next partition,
+ * since the index array is initialised with an extra -1
+ * at the end.
+ */
+ offset = partition_range_bsearch(key, boundinfo, lower,
+ &equal);
+
+ if (boundinfo->indexes[offset + 1] < 0)
+ {
+ /*
+ * Check that the new partition will fit in the gap.
+ * For it to fit, the new upper bound must be less
+ * than or equal to the lower bound of the next
+ * partition, if there is one.
+ */
+ if (offset + 1 < boundinfo->ndatums)
+ {
+ int32 cmpval;
+ Datum *datums;
+ PartitionRangeDatumKind *kind;
+ bool is_lower;
+
+ datums = boundinfo->datums[offset + 1];
+ kind = boundinfo->kind[offset + 1];
+ is_lower = (boundinfo->indexes[offset + 1] == -1);
+
+ cmpval = partition_rbound_cmp(key, datums, kind,
+ is_lower, upper);
+ if (cmpval < 0)
+ {
+ /*
+ * The new partition overlaps with the
+ * existing partition between offset + 1 and
+ * offset + 2.
+ */
+ overlap = true;
+ with = boundinfo->indexes[offset + 2];
+ }
+ }
+ }
+ else
+ {
+ /*
+ * The new partition overlaps with the existing
+ * partition between offset and offset + 1.
+ */
+ overlap = true;
+ with = boundinfo->indexes[offset + 1];
+ }
+ }
+
+ break;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ if (overlap)
+ {
+ Assert(with >= 0);
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("partition \"%s\" would overlap partition \"%s\"",
+ relname, get_rel_name(partdesc->oids[with])),
+ parser_errposition(pstate, spec->location)));
+ }
+}
+
+/*
+ * RelationGetPartitionQual
+ *
+ * Returns a list of partition quals
+ */
+List *
+RelationGetPartitionQual(Relation rel)
+{
+ /* Quick exit */
+ if (!rel->rd_rel->relispartition)
+ return NIL;
+
+ return generate_partition_qual(rel);
+}
+
+/*
+ * get_partition_qual_relid
+ *
+ * Returns an expression tree describing the passed-in relation's partition
+ * constraint. If there is no partition constraint returns NULL; this can
+ * happen if the default partition is the only partition.
+ */
+Expr *
+get_partition_qual_relid(Oid relid)
+{
+ Relation rel = heap_open(relid, AccessShareLock);
+ Expr *result = NULL;
+ List *and_args;
+
+ /* Do the work only if this relation is a partition. */
+ if (rel->rd_rel->relispartition)
+ {
+ and_args = generate_partition_qual(rel);
+
+ if (and_args == NIL)
+ result = NULL;
+ else if (list_length(and_args) > 1)
+ result = makeBoolExpr(AND_EXPR, and_args, -1);
+ else
+ result = linitial(and_args);
+ }
+
+ /* Keep the lock. */
+ heap_close(rel, NoLock);
+
+ return result;
+}
+
+/*
+ * Checks if any of the 'attnums' is a partition key attribute for rel
+ *
+ * Sets *used_in_expr if any of the 'attnums' is found to be referenced in some
+ * partition key expression. It's possible for a column to be both used
+ * directly and as part of an expression; if that happens, *used_in_expr may
+ * end up as either true or false. That's OK for current uses of this
+ * function, because *used_in_expr is only used to tailor the error message
+ * text.
+ */
+bool
+has_partition_attrs(Relation rel, Bitmapset *attnums,
+ bool *used_in_expr)
+{
+ PartitionKey key;
+ int partnatts;
+ List *partexprs;
+ ListCell *partexprs_item;
+ int i;
+
+ if (attnums == NULL || rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+ return false;
+
+ key = RelationGetPartitionKey(rel);
+ partnatts = get_partition_natts(key);
+ partexprs = get_partition_exprs(key);
+
+ partexprs_item = list_head(partexprs);
+ for (i = 0; i < partnatts; i++)
+ {
+ AttrNumber partattno = get_partition_col_attnum(key, i);
+
+ if (partattno != 0)
+ {
+ if (bms_is_member(partattno - FirstLowInvalidHeapAttributeNumber,
+ attnums))
+ {
+ if (used_in_expr)
+ *used_in_expr = false;
+ return true;
+ }
+ }
+ else
+ {
+ /* Arbitrary expression */
+ Node *expr = (Node *) lfirst(partexprs_item);
+ Bitmapset *expr_attrs = NULL;
+
+ /* Find all attributes referenced */
+ pull_varattnos(expr, 1, &expr_attrs);
+ partexprs_item = lnext(partexprs_item);
+
+ if (bms_overlap(attnums, expr_attrs))
+ {
+ if (used_in_expr)
+ *used_in_expr = true;
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
+/*
+ * get_partition_for_tuple
+ * Finds partition of relation which accepts the partition key specified
+ * in values and isnull
+ *
+ * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
+ * found or -1 if none found.
+ */
+int
+get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
+{
+ int bound_offset;
+ int part_index = -1;
+ PartitionKey key = RelationGetPartitionKey(relation);
+ PartitionDesc partdesc = RelationGetPartitionDesc(relation);
+
+ /* Route as appropriate based on partitioning strategy. */
+ switch (key->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ {
+ PartitionBoundInfo boundinfo = partdesc->boundinfo;
+ int greatest_modulus = get_greatest_modulus(boundinfo);
+ uint64 rowHash = compute_hash_value(key, values, isnull);
+
+ part_index = boundinfo->indexes[rowHash % greatest_modulus];
+ }
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ if (isnull[0])
+ {
+ if (partition_bound_accepts_nulls(partdesc->boundinfo))
+ part_index = partdesc->boundinfo->null_index;
+ }
+ else
+ {
+ bool equal = false;
+
+ bound_offset = partition_list_bsearch(key,
+ partdesc->boundinfo,
+ values[0], &equal);
+ if (bound_offset >= 0 && equal)
+ part_index = partdesc->boundinfo->indexes[bound_offset];
+ }
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ {
+ bool equal = false,
+ range_partkey_has_null = false;
+ int i;
+
+ /*
+ * No range includes NULL, so this will be accepted by the
+ * default partition if there is one, and otherwise rejected.
+ */
+ for (i = 0; i < key->partnatts; i++)
+ {
+ if (isnull[i])
+ {
+ range_partkey_has_null = true;
+ break;
+ }
+ }
+
+ if (!range_partkey_has_null)
+ {
+ bound_offset = partition_range_datum_bsearch(key,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
+ /*
+ * The bound at bound_offset is less than or equal to the
+ * tuple value, so the bound at offset+1 is the upper
+ * bound of the partition we're looking for, if there
+ * actually exists one.
+ */
+ part_index = partdesc->boundinfo->indexes[bound_offset + 1];
+ }
+ }
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) key->strategy);
+ }
+
+ /*
+ * part_index < 0 means we failed to find a partition of this parent. Use
+ * the default partition, if there is one.
+ */
+ if (part_index < 0)
+ part_index = partdesc->boundinfo->default_index;
+
+ return part_index;
+}
+
+/*
+ * get_greatest_modulus
+ *
+ * Returns the greatest modulus of the hash partition bound. The greatest
+ * modulus will be at the end of the datums array because hash partitions are
+ * arranged in the ascending order of their modulus and remainders.
+ */
+int
+get_greatest_modulus(PartitionBoundInfo bound)
+{
+ Assert(bound && bound->strategy == PARTITION_STRATEGY_HASH);
+ Assert(bound->datums && bound->ndatums > 0);
+ Assert(DatumGetInt32(bound->datums[bound->ndatums - 1][0]) > 0);
+
+ return DatumGetInt32(bound->datums[bound->ndatums - 1][0]);
+}
+
+/*
+ * compute_hash_value
+ *
+ * Compute the hash value for given not null partition key values.
+ */
+uint64
+compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
+{
+ int i;
+ int nkeys = key->partnatts;
+ uint64 rowHash = 0;
+ Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
+
+ for (i = 0; i < nkeys; i++)
+ {
+ if (!isnull[i])
+ {
+ Datum hash;
+
+ Assert(OidIsValid(key->partsupfunc[i].fn_oid));
+
+ /*
+ * Compute hash for each datum value by calling respective
+ * datatype-specific hash functions of each partition key
+ * attribute.
+ */
+ hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
+
+ /* Form a single 64-bit hash value */
+ rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
+ }
+ }
+
+ return rowHash;
+}
+
+/*
+ * get_default_oid_from_partdesc
+ *
+ * Given a partition descriptor, return the OID of the default partition, if
+ * one exists; else, return InvalidOid.
+ */
+Oid
+get_default_oid_from_partdesc(PartitionDesc partdesc)
+{
+ if (partdesc && partdesc->boundinfo &&
+ partition_bound_has_default(partdesc->boundinfo))
+ return partdesc->oids[partdesc->boundinfo->default_index];
+
+ return InvalidOid;
+}
+
+/* Module-local functions. */
+
+/*
+ * generate_partition_qual
+ *
+ * Generate partition predicate from rel's partition bound expression. The
+ * function returns a NIL list if there is no predicate.
+ *
+ * Result expression tree is stored CacheMemoryContext to ensure it survives
+ * as long as the relcache entry. But we should be running in a less long-lived
+ * working context. To avoid leaking cache memory if this routine fails partway
+ * through, we build in working memory and then copy the completed structure
+ * into cache memory.
+ */
+static List *
+generate_partition_qual(Relation rel)
+{
+ HeapTuple tuple;
+ MemoryContext oldcxt;
+ Datum boundDatum;
+ bool isnull;
+ PartitionBoundSpec *bound;
+ List *my_qual = NIL,
+ *result = NIL;
+ Relation parent;
+ bool found_whole_row;
+
+ /* Guard against stack overflow due to overly deep partition tree */
+ check_stack_depth();
+
+ /* Quick copy */
+ if (rel->rd_partcheck != NIL)
+ return copyObject(rel->rd_partcheck);
+
+ /* Grab at least an AccessShareLock on the parent table */
+ parent = heap_open(get_partition_parent(RelationGetRelid(rel)),
+ AccessShareLock);
+
+ /* Get pg_class.relpartbound */
+ tuple = SearchSysCache1(RELOID, RelationGetRelid(rel));
+ if (!HeapTupleIsValid(tuple))
+ elog(ERROR, "cache lookup failed for relation %u",
+ RelationGetRelid(rel));
+
+ boundDatum = SysCacheGetAttr(RELOID, tuple,
+ Anum_pg_class_relpartbound,
+ &isnull);
+ if (isnull) /* should not happen */
+ elog(ERROR, "relation \"%s\" has relpartbound = null",
+ RelationGetRelationName(rel));
+ bound = castNode(PartitionBoundSpec,
+ stringToNode(TextDatumGetCString(boundDatum)));
+ ReleaseSysCache(tuple);
+
+ my_qual = get_qual_from_partbound(rel, parent, bound);
+
+ /* Add the parent's quals to the list (if any) */
+ if (parent->rd_rel->relispartition)
+ result = list_concat(generate_partition_qual(parent), my_qual);
+ else
+ result = my_qual;
+
+ /*
+ * Change Vars to have partition's attnos instead of the parent's. We do
+ * this after we concatenate the parent's quals, because we want every Var
+ * in it to bear this relation's attnos. It's safe to assume varno = 1
+ * here.
+ */
+ result = map_partition_varattnos(result, 1, rel, parent,
+ &found_whole_row);
+ /* There can never be a whole-row reference here */
+ if (found_whole_row)
+ elog(ERROR, "unexpected whole-row reference found in partition key");
+
+ /* Save a copy in the relcache */
+ oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+ rel->rd_partcheck = copyObject(result);
+ MemoryContextSwitchTo(oldcxt);
+
+ /* Keep the parent locked until commit */
+ heap_close(parent, NoLock);
+
+ return result;
+}
+
+/*
+ * partition_hbound_cmp
+ *
+ * Compares modulus first, then remainder if modulus are equal.
+ */
+static int32
+partition_hbound_cmp(int modulus1, int remainder1, int modulus2, int remainder2)
+{
+ if (modulus1 < modulus2)
+ return -1;
+ if (modulus1 > modulus2)
+ return 1;
+ if (modulus1 == modulus2 && remainder1 != remainder2)
+ return (remainder1 > remainder2) ? 1 : -1;
+ return 0;
+}
+
+/*
+ * qsort_partition_hbound_cmp
+ *
+ * We sort hash bounds by modulus, then by remainder.
+ */
+static int32
+qsort_partition_hbound_cmp(const void *a, const void *b)
+{
+ PartitionHashBound *h1 = (*(PartitionHashBound *const *) a);
+ PartitionHashBound *h2 = (*(PartitionHashBound *const *) b);
+
+ return partition_hbound_cmp(h1->modulus, h1->remainder,
+ h2->modulus, h2->remainder);
+}
+
+/*
+ * qsort_partition_list_value_cmp
+ *
+ * Compare two list partition bound datums
+ */
+static int32
+qsort_partition_list_value_cmp(const void *a, const void *b, void *arg)
+{
+ Datum val1 = (*(const PartitionListValue **) a)->value,
+ val2 = (*(const PartitionListValue **) b)->value;
+ PartitionKey key = (PartitionKey) arg;
+
+ return DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ val1, val2));
+}
+
+/*
+ * make_one_range_bound
+ *
+ * Return a PartitionRangeBound given a list of PartitionRangeDatum elements
+ * and a flag telling whether the bound is lower or not. Made into a function
+ * because there are multiple sites that want to use this facility.
+ */
+static PartitionRangeBound *
+make_one_range_bound(PartitionKey key, int index, List *datums, bool lower)
+{
+ PartitionRangeBound *bound;
+ ListCell *lc;
+ int i;
+
+ Assert(datums != NIL);
+
+ bound = (PartitionRangeBound *) palloc0(sizeof(PartitionRangeBound));
+ bound->index = index;
+ bound->datums = (Datum *) palloc0(key->partnatts * sizeof(Datum));
+ bound->kind = (PartitionRangeDatumKind *) palloc0(key->partnatts *
+ sizeof(PartitionRangeDatumKind));
+ bound->lower = lower;
+
+ i = 0;
+ foreach(lc, datums)
+ {
+ PartitionRangeDatum *datum = castNode(PartitionRangeDatum, lfirst(lc));
+
+ /* What's contained in this range datum? */
+ bound->kind[i] = datum->kind;
+
+ if (datum->kind == PARTITION_RANGE_DATUM_VALUE)
+ {
+ Const *val = castNode(Const, datum->value);
+
+ if (val->constisnull)
+ elog(ERROR, "invalid range bound datum");
+ bound->datums[i] = val->constvalue;
+ }
+
+ i++;
+ }
+
+ return bound;
+}
+
+/*
+ * partition_rbound_cmp
+ *
+ * Return for two range bounds whether the 1st one (specified in datums1,
+ * kind1, and lower1) is <, =, or > the bound specified in *b2.
+ *
+ * Note that if the values of the two range bounds compare equal, then we take
+ * into account whether they are upper or lower bounds, and an upper bound is
+ * considered to be smaller than a lower bound. This is important to the way
+ * that RelationBuildPartitionDesc() builds the PartitionBoundInfoData
+ * structure, which only stores the upper bound of a common boundary between
+ * two contiguous partitions.
+ */
+static int32
+partition_rbound_cmp(PartitionKey key,
+ Datum *datums1, PartitionRangeDatumKind *kind1,
+ bool lower1, PartitionRangeBound *b2)
+{
+ int32 cmpval = 0; /* placate compiler */
+ int i;
+ Datum *datums2 = b2->datums;
+ PartitionRangeDatumKind *kind2 = b2->kind;
+ bool lower2 = b2->lower;
+
+ for (i = 0; i < key->partnatts; i++)
+ {
+ /*
+ * First, handle cases where the column is unbounded, which should not
+ * invoke the comparison procedure, and should not consider any later
+ * columns. Note that the PartitionRangeDatumKind enum elements
+ * compare the same way as the values they represent.
+ */
+ if (kind1[i] < kind2[i])
+ return -1;
+ else if (kind1[i] > kind2[i])
+ return 1;
+ else if (kind1[i] != PARTITION_RANGE_DATUM_VALUE)
+
+ /*
+ * The column bounds are both MINVALUE or both MAXVALUE. No later
+ * columns should be considered, but we still need to compare
+ * whether they are upper or lower bounds.
+ */
+ break;
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
+ key->partcollation[i],
+ datums1[i],
+ datums2[i]));
+ if (cmpval != 0)
+ break;
+ }
+
+ /*
+ * If the comparison is anything other than equal, we're done. If they
+ * compare equal though, we still have to consider whether the boundaries
+ * are inclusive or exclusive. Exclusive one is considered smaller of the
+ * two.
+ */
+ if (cmpval == 0 && lower1 != lower2)
+ cmpval = lower1 ? 1 : -1;
+
+ return cmpval;
+}
+
+/* Used when sorting range bounds across all range partitions */
+static int32
+qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
+{
+ PartitionRangeBound *b1 = (*(PartitionRangeBound *const *) a);
+ PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
+ PartitionKey key = (PartitionKey) arg;
+
+ return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+}
+
+/*
+ * partition_list_bsearch
+ * Returns the index of the greatest bound datum that is less than equal
+ * to the given value or -1 if all of the bound datums are greater
+ *
+ * *is_equal is set to true if the bound datum at the returned index is equal
+ * to the input value.
+ */
+static int
+partition_list_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ Datum value, bool *is_equal)
+{
+ int lo,
+ hi,
+ mid;
+
+ lo = -1;
+ hi = boundinfo->ndatums - 1;
+ while (lo < hi)
+ {
+ int32 cmpval;
+
+ mid = (lo + hi + 1) / 2;
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
+ key->partcollation[0],
+ boundinfo->datums[mid][0],
+ value));
+ if (cmpval <= 0)
+ {
+ lo = mid;
+ *is_equal = (cmpval == 0);
+ if (*is_equal)
+ break;
+ }
+ else
+ hi = mid - 1;
+ }
+
+ return lo;
+}
+
+/*
+ * partition_rbound_datum_cmp
+ *
+ * Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
+ * is <, =, or > partition key of tuple (tuple_datums)
+ */
+static int32
+partition_rbound_datum_cmp(PartitionKey key,
+ Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
+ Datum *tuple_datums, int n_tuple_datums)
+{
+ int i;
+ int32 cmpval = -1;
+
+ for (i = 0; i < n_tuple_datums; i++)
+ {
+ if (rb_kind[i] == PARTITION_RANGE_DATUM_MINVALUE)
+ return -1;
+ else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
+ return 1;
+
+ cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
+ key->partcollation[i],
+ rb_datums[i],
+ tuple_datums[i]));
+ if (cmpval != 0)
+ break;
+ }
+
+ return cmpval;
+}
+
+/*
+ * partition_range_bsearch
+ * Returns the index of the greatest range bound that is less than or
+ * equal to the given range bound or -1 if all of the range bounds are
+ * greater
+ *
+ * *is_equal is set to true if the range bound at the returned index is equal
+ * to the input range bound
+ */
+static int
+partition_range_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ PartitionRangeBound *probe, bool *is_equal)
+{
+ int lo,
+ hi,
+ mid;
+
+ lo = -1;
+ hi = boundinfo->ndatums - 1;
+ while (lo < hi)
+ {
+ int32 cmpval;
+
+ mid = (lo + hi + 1) / 2;
+ cmpval = partition_rbound_cmp(key,
+ boundinfo->datums[mid],
+ boundinfo->kind[mid],
+ (boundinfo->indexes[mid] == -1),
+ probe);
+ if (cmpval <= 0)
+ {
+ lo = mid;
+ *is_equal = (cmpval == 0);
+
+ if (*is_equal)
+ break;
+ }
+ else
+ hi = mid - 1;
+ }
+
+ return lo;
+}
+
+/*
+ * partition_range_datum_bsearch
+ * Returns the index of the greatest range bound that is less than or
+ * equal to the given tuple or -1 if all of the range bounds are greater
+ *
+ * *is_equal is set to true if the range bound at the returned index is equal
+ * to the input tuple.
+ */
+static int
+partition_range_datum_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ int nvalues, Datum *values, bool *is_equal)
+{
+ int lo,
+ hi,
+ mid;
+
+ lo = -1;
+ hi = boundinfo->ndatums - 1;
+ while (lo < hi)
+ {
+ int32 cmpval;
+
+ mid = (lo + hi + 1) / 2;
+ cmpval = partition_rbound_datum_cmp(key,
+ boundinfo->datums[mid],
+ boundinfo->kind[mid],
+ values,
+ nvalues);
+ if (cmpval <= 0)
+ {
+ lo = mid;
+ *is_equal = (cmpval == 0);
+
+ if (*is_equal)
+ break;
+ }
+ else
+ hi = mid - 1;
+ }
+
+ return lo;
+}
+
+/*
+ * partition_hash_bsearch
+ * Returns the index of the greatest (modulus, remainder) pair that is
+ * less than or equal to the given (modulus, remainder) pair or -1 if
+ * all of them are greater
+ */
+static int
+partition_hash_bsearch(PartitionKey key,
+ PartitionBoundInfo boundinfo,
+ int modulus, int remainder)
+{
+ int lo,
+ hi,
+ mid;
+
+ lo = -1;
+ hi = boundinfo->ndatums - 1;
+ while (lo < hi)
+ {
+ int32 cmpval,
+ bound_modulus,
+ bound_remainder;
+
+ mid = (lo + hi + 1) / 2;
+ bound_modulus = DatumGetInt32(boundinfo->datums[mid][0]);
+ bound_remainder = DatumGetInt32(boundinfo->datums[mid][1]);
+ cmpval = partition_hbound_cmp(bound_modulus, bound_remainder,
+ modulus, remainder);
+ if (cmpval <= 0)
+ {
+ lo = mid;
+
+ if (cmpval == 0)
+ break;
+ }
+ else
+ hi = mid - 1;
+ }
+
+ return lo;
+}
+
+/*
+ * get_partition_bound_num_indexes
+ *
+ * Returns the number of the entries in the partition bound indexes array.
+ */
+static int
+get_partition_bound_num_indexes(PartitionBoundInfo bound)
+{
+ int num_indexes;
+
+ Assert(bound);
+
+ switch (bound->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+
+ /*
+ * The number of the entries in the indexes array is same as the
+ * greatest modulus.
+ */
+ num_indexes = get_greatest_modulus(bound);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ num_indexes = bound->ndatums;
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* Range partitioned table has an extra index. */
+ num_indexes = bound->ndatums + 1;
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) bound->strategy);
+ }
+
+ return num_indexes;
+}
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 1ebf9c4ed2..a747f53e7f 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -81,6 +81,7 @@
#include "utils/inval.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
+#include "utils/partcache.h"
#include "utils/relmapper.h"
#include "utils/resowner_private.h"
#include "utils/snapmgr.h"
@@ -261,7 +262,6 @@ static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_hi
static Relation AllocateRelationDesc(Form_pg_class relp);
static void RelationParseRelOptions(Relation relation, HeapTuple tuple);
static void RelationBuildTupleDesc(Relation relation);
-static void RelationBuildPartitionKey(Relation relation);
static Relation RelationBuildDesc(Oid targetRelId, bool insertIt);
static void RelationInitPhysicalAddr(Relation relation);
static void load_critical_index(Oid indexoid, Oid heapoid);
@@ -809,209 +809,6 @@ RelationBuildRuleLock(Relation relation)
}
/*
- * RelationBuildPartitionKey
- * Build and attach to relcache partition key data of relation
- *
- * Partitioning key data is a complex structure; to avoid complicated logic to
- * free individual elements whenever the relcache entry is flushed, we give it
- * its own memory context, child of CacheMemoryContext, which can easily be
- * deleted on its own. To avoid leaking memory in that context in case of an
- * error partway through this function, the context is initially created as a
- * child of CurTransactionContext and only re-parented to CacheMemoryContext
- * at the end, when no further errors are possible. Also, we don't make this
- * context the current context except in very brief code sections, out of fear
- * that some of our callees allocate memory on their own which would be leaked
- * permanently.
- */
-static void
-RelationBuildPartitionKey(Relation relation)
-{
- Form_pg_partitioned_table form;
- HeapTuple tuple;
- bool isnull;
- int i;
- PartitionKey key;
- AttrNumber *attrs;
- oidvector *opclass;
- oidvector *collation;
- ListCell *partexprs_item;
- Datum datum;
- MemoryContext partkeycxt,
- oldcxt;
- int16 procnum;
-
- tuple = SearchSysCache1(PARTRELID,
- ObjectIdGetDatum(RelationGetRelid(relation)));
-
- /*
- * The following happens when we have created our pg_class entry but not
- * the pg_partitioned_table entry yet.
- */
- if (!HeapTupleIsValid(tuple))
- return;
-
- partkeycxt = AllocSetContextCreateExtended(CurTransactionContext,
- RelationGetRelationName(relation),
- MEMCONTEXT_COPY_NAME,
- ALLOCSET_SMALL_SIZES);
-
- key = (PartitionKey) MemoryContextAllocZero(partkeycxt,
- sizeof(PartitionKeyData));
-
- /* Fixed-length attributes */
- form = (Form_pg_partitioned_table) GETSTRUCT(tuple);
- key->strategy = form->partstrat;
- key->partnatts = form->partnatts;
-
- /*
- * We can rely on the first variable-length attribute being mapped to the
- * relevant field of the catalog's C struct, because all previous
- * attributes are non-nullable and fixed-length.
- */
- attrs = form->partattrs.values;
-
- /* But use the hard way to retrieve further variable-length attributes */
- /* Operator class */
- datum = SysCacheGetAttr(PARTRELID, tuple,
- Anum_pg_partitioned_table_partclass, &isnull);
- Assert(!isnull);
- opclass = (oidvector *) DatumGetPointer(datum);
-
- /* Collation */
- datum = SysCacheGetAttr(PARTRELID, tuple,
- Anum_pg_partitioned_table_partcollation, &isnull);
- Assert(!isnull);
- collation = (oidvector *) DatumGetPointer(datum);
-
- /* Expressions */
- datum = SysCacheGetAttr(PARTRELID, tuple,
- Anum_pg_partitioned_table_partexprs, &isnull);
- if (!isnull)
- {
- char *exprString;
- Node *expr;
-
- exprString = TextDatumGetCString(datum);
- expr = stringToNode(exprString);
- pfree(exprString);
-
- /*
- * Run the expressions through const-simplification since the planner
- * will be comparing them to similarly-processed qual clause operands,
- * and may fail to detect valid matches without this step; fix
- * opfuncids while at it. We don't need to bother with
- * canonicalize_qual() though, because partition expressions are not
- * full-fledged qualification clauses.
- */
- expr = eval_const_expressions(NULL, expr);
- fix_opfuncids(expr);
-
- oldcxt = MemoryContextSwitchTo(partkeycxt);
- key->partexprs = (List *) copyObject(expr);
- MemoryContextSwitchTo(oldcxt);
- }
-
- oldcxt = MemoryContextSwitchTo(partkeycxt);
- key->partattrs = (AttrNumber *) palloc0(key->partnatts * sizeof(AttrNumber));
- key->partopfamily = (Oid *) palloc0(key->partnatts * sizeof(Oid));
- key->partopcintype = (Oid *) palloc0(key->partnatts * sizeof(Oid));
- key->partsupfunc = (FmgrInfo *) palloc0(key->partnatts * sizeof(FmgrInfo));
-
- key->partcollation = (Oid *) palloc0(key->partnatts * sizeof(Oid));
-
- /* Gather type and collation info as well */
- key->parttypid = (Oid *) palloc0(key->partnatts * sizeof(Oid));
- key->parttypmod = (int32 *) palloc0(key->partnatts * sizeof(int32));
- key->parttyplen = (int16 *) palloc0(key->partnatts * sizeof(int16));
- key->parttypbyval = (bool *) palloc0(key->partnatts * sizeof(bool));
- key->parttypalign = (char *) palloc0(key->partnatts * sizeof(char));
- key->parttypcoll = (Oid *) palloc0(key->partnatts * sizeof(Oid));
- MemoryContextSwitchTo(oldcxt);
-
- /* determine support function number to search for */
- procnum = (key->strategy == PARTITION_STRATEGY_HASH) ?
- HASHEXTENDED_PROC : BTORDER_PROC;
-
- /* Copy partattrs and fill other per-attribute info */
- memcpy(key->partattrs, attrs, key->partnatts * sizeof(int16));
- partexprs_item = list_head(key->partexprs);
- for (i = 0; i < key->partnatts; i++)
- {
- AttrNumber attno = key->partattrs[i];
- HeapTuple opclasstup;
- Form_pg_opclass opclassform;
- Oid funcid;
-
- /* Collect opfamily information */
- opclasstup = SearchSysCache1(CLAOID,
- ObjectIdGetDatum(opclass->values[i]));
- if (!HeapTupleIsValid(opclasstup))
- elog(ERROR, "cache lookup failed for opclass %u", opclass->values[i]);
-
- opclassform = (Form_pg_opclass) GETSTRUCT(opclasstup);
- key->partopfamily[i] = opclassform->opcfamily;
- key->partopcintype[i] = opclassform->opcintype;
-
- /* Get a support function for the specified opfamily and datatypes */
- funcid = get_opfamily_proc(opclassform->opcfamily,
- opclassform->opcintype,
- opclassform->opcintype,
- procnum);
- if (!OidIsValid(funcid))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
- errmsg("operator class \"%s\" of access method %s is missing support function %d for type %s",
- NameStr(opclassform->opcname),
- (key->strategy == PARTITION_STRATEGY_HASH) ?
- "hash" : "btree",
- procnum,
- format_type_be(opclassform->opcintype))));
-
- fmgr_info(funcid, &key->partsupfunc[i]);
-
- /* Collation */
- key->partcollation[i] = collation->values[i];
-
- /* Collect type information */
- if (attno != 0)
- {
- Form_pg_attribute att = TupleDescAttr(relation->rd_att, attno - 1);
-
- key->parttypid[i] = att->atttypid;
- key->parttypmod[i] = att->atttypmod;
- key->parttypcoll[i] = att->attcollation;
- }
- else
- {
- if (partexprs_item == NULL)
- elog(ERROR, "wrong number of partition key expressions");
-
- key->parttypid[i] = exprType(lfirst(partexprs_item));
- key->parttypmod[i] = exprTypmod(lfirst(partexprs_item));
- key->parttypcoll[i] = exprCollation(lfirst(partexprs_item));
-
- partexprs_item = lnext(partexprs_item);
- }
- get_typlenbyvalalign(key->parttypid[i],
- &key->parttyplen[i],
- &key->parttypbyval[i],
- &key->parttypalign[i]);
-
- ReleaseSysCache(opclasstup);
- }
-
- ReleaseSysCache(tuple);
-
- /*
- * Success --- reparent our context and make the relcache point to the
- * newly constructed key
- */
- MemoryContextSetParent(partkeycxt, CacheMemoryContext);
- relation->rd_partkeycxt = partkeycxt;
- relation->rd_partkey = key;
-}
-
-/*
* equalRuleLocks
*
* Determine whether two RuleLocks are equivalent
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..894c8f4091 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -14,63 +14,22 @@
#define PARTITION_H
#include "fmgr.h"
-#include "executor/tuptable.h"
-#include "nodes/execnodes.h"
-#include "parser/parse_node.h"
#include "utils/rel.h"
/* Seed for the extended hash function */
#define HASH_PARTITION_SEED UINT64CONST(0x7A5B22367996DCFD)
-/*
- * PartitionBoundInfo encapsulates a set of partition bounds. It is usually
- * associated with partitioned tables as part of its partition descriptor.
- *
- * The internal structure is opaque outside partition.c.
- */
-typedef struct PartitionBoundInfoData *PartitionBoundInfo;
-
-/*
- * Information about partitions of a partitioned table.
- */
-typedef struct PartitionDescData
-{
- int nparts; /* Number of partitions */
- Oid *oids; /* OIDs of partitions */
- PartitionBoundInfo boundinfo; /* collection of partition bounds */
-} PartitionDescData;
-
-typedef struct PartitionDescData *PartitionDesc;
-
-extern void RelationBuildPartitionDesc(Relation relation);
-extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
- bool *parttypbyval, PartitionBoundInfo b1,
- PartitionBoundInfo b2);
-extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
- PartitionKey key);
-
-extern void check_new_partition_bound(char *relname, Relation parent,
- PartitionBoundSpec *spec);
extern Oid get_partition_parent(Oid relid);
extern List *get_qual_from_partbound(Relation rel, Relation parent,
PartitionBoundSpec *spec);
extern List *map_partition_varattnos(List *expr, int fromrel_varno,
Relation to_rel, Relation from_rel,
bool *found_whole_row);
-extern List *RelationGetPartitionQual(Relation rel);
-extern Expr *get_partition_qual_relid(Oid relid);
-extern bool has_partition_attrs(Relation rel, Bitmapset *attnums,
- bool *used_in_expr);
-extern Oid get_default_oid_from_partdesc(PartitionDesc partdesc);
extern Oid get_default_partition_oid(Oid parentId);
extern void update_default_partition_oid(Oid parentId, Oid defaultPartId);
extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
-/* For tuple routing */
-extern int get_partition_for_tuple(Relation relation, Datum *values,
- bool *isnull);
-
#endif /* PARTITION_H */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3df9c498bb..c53dfcc265 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -13,10 +13,10 @@
#ifndef EXECPARTITION_H
#define EXECPARTITION_H
-#include "catalog/partition.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
+#include "utils/partcache.h"
/*-----------------------
* PartitionDispatch - information about one partitioned table in a partition
diff --git a/src/include/utils/partcache.h b/src/include/utils/partcache.h
new file mode 100644
index 0000000000..5d4caeda3a
--- /dev/null
+++ b/src/include/utils/partcache.h
@@ -0,0 +1,191 @@
+/*-------------------------------------------------------------------------
+ *
+ * partcache.h
+ * Header file for partitioning related cached data structures and
+ * manipulation functions
+ *
+ * Copyright (c) 2007-2018, PostgreSQL Global Development Group
+ *
+ * src/include/utils/partcache.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTCACHE_H
+#define PARTCACHE_H
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "nodes/parsenodes.h"
+#include "utils/lsyscache.h"
+#include "utils/relcache.h"
+
+/*
+ * Information about the partition key of a relation
+ */
+typedef struct PartitionKeyData
+{
+ char strategy; /* partitioning strategy */
+ int16 partnatts; /* number of columns in the partition key */
+ AttrNumber *partattrs; /* attribute numbers of columns in the
+ * partition key */
+ List *partexprs; /* list of expressions in the partitioning
+ * key, or NIL */
+
+ Oid *partopfamily; /* OIDs of operator families */
+ Oid *partopcintype; /* OIDs of opclass declared input data types */
+ FmgrInfo *partsupfunc; /* lookup info for support funcs */
+
+ /* Partitioning collation per attribute */
+ Oid *partcollation;
+
+ /* Type information per attribute */
+ Oid *parttypid;
+ int32 *parttypmod;
+ int16 *parttyplen;
+ bool *parttypbyval;
+ char *parttypalign;
+ Oid *parttypcoll;
+} PartitionKeyData;
+
+typedef struct PartitionKeyData *PartitionKey;
+
+typedef struct PartitionBoundInfoData *PartitionBoundInfo;
+
+/*
+ * Information about partitions of a partitioned table.
+ */
+typedef struct PartitionDescData
+{
+ int nparts; /* Number of partitions */
+ Oid *oids; /* OIDs of partitions */
+ PartitionBoundInfo boundinfo; /* collection of partition bounds */
+} PartitionDescData;
+
+typedef struct PartitionDescData *PartitionDesc;
+
+/*
+ * Information about bounds of a partitioned relation
+ *
+ * A list partition datum that is known to be NULL is never put into the
+ * datums array. Instead, it is tracked using the null_index field.
+ *
+ * In the case of range partitioning, ndatums will typically be far less than
+ * 2 * nparts, because a partition's upper bound and the next partition's lower
+ * bound are the same in most common cases, and we only store one of them (the
+ * upper bound). In case of hash partitioning, ndatums will be same as the
+ * number of partitions.
+ *
+ * For range and list partitioned tables, datums is an array of datum-tuples
+ * with key->partnatts datums each. For hash partitioned tables, it is an array
+ * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
+ * given partition.
+ *
+ * The datums in datums array are arranged in increasing order as defined by
+ * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
+ * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
+ * respectively. For range and list partitions this simply means that the
+ * datums in the datums array are arranged in increasing order as defined by
+ * the partition key's operator classes and collations.
+ *
+ * In the case of list partitioning, the indexes array stores one entry for
+ * every datum, which is the index of the partition that accepts a given datum.
+ * In case of range partitioning, it stores one entry per distinct range
+ * datum, which is the index of the partition for which a given datum
+ * is an upper bound. In the case of hash partitioning, the number of the
+ * entries in the indexes array is same as the greatest modulus amongst all
+ * partitions. For a given partition key datum-tuple, the index of the
+ * partition which would accept that datum-tuple would be given by the entry
+ * pointed by remainder produced when hash value of the datum-tuple is divided
+ * by the greatest modulus.
+ */
+
+typedef struct PartitionBoundInfoData
+{
+ char strategy; /* hash, list or range? */
+ int ndatums; /* Length of the datums following array */
+ Datum **datums;
+ PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
+ * NULL for hash and list partitioned
+ * tables */
+ int *indexes; /* Partition indexes */
+ int null_index; /* Index of the null-accepting partition; -1
+ * if there isn't one */
+ int default_index; /* Index of the default partition; -1 if there
+ * isn't one */
+} PartitionBoundInfoData;
+
+#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
+#define partition_bound_has_default(bi) ((bi)->default_index != -1)
+
+/*
+ * PartitionKey inquiry functions
+ */
+static inline int
+get_partition_strategy(PartitionKey key)
+{
+ return key->strategy;
+}
+
+static inline int
+get_partition_natts(PartitionKey key)
+{
+ return key->partnatts;
+}
+
+static inline List *
+get_partition_exprs(PartitionKey key)
+{
+ return key->partexprs;
+}
+
+/*
+ * PartitionKey inquiry functions - one column
+ */
+static inline int16
+get_partition_col_attnum(PartitionKey key, int col)
+{
+ return key->partattrs[col];
+}
+
+static inline Oid
+get_partition_col_typid(PartitionKey key, int col)
+{
+ return key->parttypid[col];
+}
+
+static inline int32
+get_partition_col_typmod(PartitionKey key, int col)
+{
+ return key->parttypmod[col];
+}
+
+extern void RelationBuildPartitionKey(Relation relation);
+extern void RelationBuildPartitionDesc(Relation relation);
+extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
+ bool *parttypbyval, PartitionBoundInfo b1,
+ PartitionBoundInfo b2);
+
+extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src,
+ PartitionKey key);
+
+extern void check_new_partition_bound(char *relname, Relation parent,
+ PartitionBoundSpec *spec);
+
+extern List *RelationGetPartitionQual(Relation rel);
+extern Expr *get_partition_qual_relid(Oid relid);
+
+extern bool has_partition_attrs(Relation rel, Bitmapset *attnums,
+ bool *used_in_expr);
+
+extern Oid get_default_oid_from_partdesc(PartitionDesc partdesc);
+
+extern int get_greatest_modulus(PartitionBoundInfo b);
+extern uint64 compute_hash_value(PartitionKey key, Datum *values,
+ bool *isnull);
+
+/* For tuple routing */
+extern int get_partition_for_tuple(Relation relation, Datum *values,
+ bool *isnull);
+
+#endif /* PARTCACHE_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index aa8add544a..b531ef0121 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -24,6 +24,7 @@
#include "rewrite/prs2lock.h"
#include "storage/block.h"
#include "storage/relfilenode.h"
+#include "utils/partcache.h"
#include "utils/relcache.h"
#include "utils/reltrigger.h"
@@ -47,36 +48,6 @@ typedef struct LockInfoData
typedef LockInfoData *LockInfo;
/*
- * Information about the partition key of a relation
- */
-typedef struct PartitionKeyData
-{
- char strategy; /* partitioning strategy */
- int16 partnatts; /* number of columns in the partition key */
- AttrNumber *partattrs; /* attribute numbers of columns in the
- * partition key */
- List *partexprs; /* list of expressions in the partitioning
- * key, or NIL */
-
- Oid *partopfamily; /* OIDs of operator families */
- Oid *partopcintype; /* OIDs of opclass declared input data types */
- FmgrInfo *partsupfunc; /* lookup info for support funcs */
-
- /* Partitioning collation per attribute */
- Oid *partcollation;
-
- /* Type information per attribute */
- Oid *parttypid;
- int32 *parttypmod;
- int16 *parttyplen;
- bool *parttypbyval;
- char *parttypalign;
- Oid *parttypcoll;
-} PartitionKeyData;
-
-typedef struct PartitionKeyData *PartitionKey;
-
-/*
* Here are the contents of a relation cache entry.
*/
@@ -593,48 +564,6 @@ typedef struct ViewOptions
#define RelationGetPartitionKey(relation) ((relation)->rd_partkey)
/*
- * PartitionKey inquiry functions
- */
-static inline int
-get_partition_strategy(PartitionKey key)
-{
- return key->strategy;
-}
-
-static inline int
-get_partition_natts(PartitionKey key)
-{
- return key->partnatts;
-}
-
-static inline List *
-get_partition_exprs(PartitionKey key)
-{
- return key->partexprs;
-}
-
-/*
- * PartitionKey inquiry functions - one column
- */
-static inline int16
-get_partition_col_attnum(PartitionKey key, int col)
-{
- return key->partattrs[col];
-}
-
-static inline Oid
-get_partition_col_typid(PartitionKey key, int col)
-{
- return key->parttypid[col];
-}
-
-static inline int32
-get_partition_col_typmod(PartitionKey key, int col)
-{
- return key->parttypmod[col];
-}
-
-/*
* RelationGetPartitionDesc
* Returns partition descriptor for a relation.
*/
--
2.11.0
On 2018/02/09 21:36, Amit Langote wrote:
0004-Faster-partition-pruning.patch
The main patch that adds src/backend/optimizer/util/partprune.c, a module
to provide the functionality that will replace the current approach of
calling relation_excluded_by_constraints() for each partition.Sorry, but there is still this big TODO here, which I'll try to fix early
next week.+ * partprune.c + * Provides functions to prune partitions of a partitioned table by + * comparing provided set of clauses with the table's partitions' + * boundaries + * + * TODO: write a longer description of things in this file
And I tried to fix that to some degree in the attached updated version.
Thanks,
Amit
Attachments:
v26-0001-Modify-bound-comparision-functions-to-accept-mem.patchtext/plain; charset=UTF-8; name=v26-0001-Modify-bound-comparision-functions-to-accept-mem.patchDownload
From ce847cbf17d8b9981cad9342945da968d42592e3 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 6 Jul 2017 14:15:22 +0530
Subject: [PATCH v26 1/5] Modify bound comparision functions to accept members
of PartitionKey
Functions partition_rbound_cmp() and partition_rbound_datum_cmp() are
required to merge partition bounds from joining relations. While doing
so, we do not have access to the PartitionKey of either relations. So,
modify these functions to accept only required members of PartitionKey
so that the functions can be reused for merging bounds.
Ashutosh Bapat.
---
src/backend/catalog/partition.c | 53 ++++++++++++++++++++++++++++-------------
1 file changed, 36 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 31c80c7f1a..2a64757584 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -165,10 +165,12 @@ static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
List *datums, bool lower);
static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
int remainder2);
-static int32 partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation, Datum *datums1,
+ PartitionRangeDatumKind *kind1, bool lower1,
+ PartitionRangeBound *b2);
+static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
@@ -1116,8 +1118,9 @@ check_new_partition_bound(char *relname, Relation parent,
* First check if the resulting range would be empty with
* specified lower and upper bounds
*/
- if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
- upper) >= 0)
+ if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, lower->datums,
+ lower->kind, true, upper) >= 0)
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -1177,7 +1180,10 @@ check_new_partition_bound(char *relname, Relation parent,
kind = boundinfo->kind[offset + 1];
is_lower = (boundinfo->indexes[offset + 1] == -1);
- cmpval = partition_rbound_cmp(key, datums, kind,
+ cmpval = partition_rbound_cmp(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ datums, kind,
is_lower, upper);
if (cmpval < 0)
{
@@ -2814,7 +2820,9 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
PartitionKey key = (PartitionKey) arg;
- return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+ return partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, b1->datums, b1->kind,
+ b1->lower, b2);
}
/*
@@ -2823,6 +2831,10 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* Return for two range bounds whether the 1st one (specified in datums1,
* kind1, and lower1) is <, =, or > the bound specified in *b2.
*
+ * partnatts, partsupfunc and partcollation give number of attributes in the
+ * bounds to be compared, comparison function to be used and the collations of
+ * attributes resp.
+ *
* Note that if the values of the two range bounds compare equal, then we take
* into account whether they are upper or lower bounds, and an upper bound is
* considered to be smaller than a lower bound. This is important to the way
@@ -2831,7 +2843,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* two contiguous partitions.
*/
static int32
-partition_rbound_cmp(PartitionKey key,
+partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
Datum *datums1, PartitionRangeDatumKind *kind1,
bool lower1, PartitionRangeBound *b2)
{
@@ -2841,7 +2853,7 @@ partition_rbound_cmp(PartitionKey key,
PartitionRangeDatumKind *kind2 = b2->kind;
bool lower2 = b2->lower;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < partnatts; i++)
{
/*
* First, handle cases where the column is unbounded, which should not
@@ -2862,8 +2874,8 @@ partition_rbound_cmp(PartitionKey key,
*/
break;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
datums1[i],
datums2[i]));
if (cmpval != 0)
@@ -2887,9 +2899,14 @@ partition_rbound_cmp(PartitionKey key,
*
* Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
* is <, =, or > partition key of tuple (tuple_datums)
+ *
+ * n_tuple_datums, partsupfunc and partcollation give number of attributes in
+ * the bounds to be compared, comparison function to be used and the collations
+ * of attributes resp.
+ *
*/
static int32
-partition_rbound_datum_cmp(PartitionKey key,
+partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
{
@@ -2903,8 +2920,8 @@ partition_rbound_datum_cmp(PartitionKey key,
else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
return 1;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
rb_datums[i],
tuple_datums[i]));
if (cmpval != 0)
@@ -2981,7 +2998,8 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key,
+ cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3025,7 +3043,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key,
+ cmpval = partition_rbound_datum_cmp(key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
--
2.11.0
v26-0002-Refactor-partition-bound-search-functions.patchtext/plain; charset=UTF-8; name=v26-0002-Refactor-partition-bound-search-functions.patchDownload
From d4ef46c0049ee91aecc684971e4cb720a2383dc0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 8 Feb 2018 19:08:12 +0900
Subject: [PATCH v26 2/5] Refactor partition bound search functions
Remove the PartitionKey argument from their signature and instead
add provide the necessary information through other arguments.
---
src/backend/catalog/partition.c | 75 +++++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 2a64757584..dccaa232a9 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -174,22 +174,24 @@ static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
-static int partition_list_bsearch(PartitionKey key,
+static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal);
-static int partition_range_bsearch(PartitionKey key,
+static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(PartitionKey key,
+static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
+static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
@@ -1007,7 +1009,7 @@ check_new_partition_bound(char *relname, Relation parent,
* boundinfo->datums that is less than or equal to the
* (spec->modulus, spec->remainder) pair.
*/
- offset = partition_hash_bsearch(key, boundinfo,
+ offset = partition_hash_bsearch(boundinfo,
spec->modulus,
spec->remainder);
if (offset < 0)
@@ -1083,7 +1085,9 @@ check_new_partition_bound(char *relname, Relation parent,
int offset;
bool equal;
- offset = partition_list_bsearch(key, boundinfo,
+ offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
+ boundinfo,
val->constvalue,
&equal);
if (offset >= 0 && equal)
@@ -1158,7 +1162,10 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_range_bsearch(key, boundinfo, lower,
+ offset = partition_range_bsearch(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ boundinfo, lower,
&equal);
if (boundinfo->indexes[offset + 1] < 0)
@@ -2577,7 +2584,9 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int greatest_modulus = get_greatest_modulus(boundinfo);
- uint64 rowHash = compute_hash_value(key, values, isnull);
+ uint64 rowHash = compute_hash_value(key->partnatts,
+ key->partsupfunc,
+ values, isnull);
part_index = boundinfo->indexes[rowHash % greatest_modulus];
}
@@ -2593,7 +2602,8 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
bool equal = false;
- bound_offset = partition_list_bsearch(key,
+ bound_offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
partdesc->boundinfo,
values[0], &equal);
if (bound_offset >= 0 && equal)
@@ -2622,11 +2632,13 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
- bound_offset = partition_range_datum_bsearch(key,
- partdesc->boundinfo,
- key->partnatts,
- values,
- &equal);
+ bound_offset =
+ partition_range_datum_bsearch(key->partsupfunc,
+ key->partcollation,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
/*
* The bound at bound_offset is less than or equal to the
* tuple value, so the bound at offset+1 is the upper
@@ -2940,7 +2952,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* to the input value.
*/
static int
-partition_list_bsearch(PartitionKey key,
+partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
{
@@ -2955,8 +2967,8 @@ partition_list_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+ partcollation[0],
boundinfo->datums[mid][0],
value));
if (cmpval <= 0)
@@ -2983,7 +2995,8 @@ partition_list_bsearch(PartitionKey key,
* to the input range bound
*/
static int
-partition_range_bsearch(PartitionKey key,
+partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal)
{
@@ -2998,8 +3011,7 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_cmp(partnatts, partsupfunc, partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3028,7 +3040,7 @@ partition_range_bsearch(PartitionKey key,
* to the input tuple.
*/
static int
-partition_range_datum_bsearch(PartitionKey key,
+partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
{
@@ -3043,8 +3055,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
@@ -3071,8 +3083,7 @@ partition_range_datum_bsearch(PartitionKey key,
* all of them are greater
*/
static int
-partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
int lo,
@@ -3270,27 +3281,27 @@ get_greatest_modulus(PartitionBoundInfo bound)
* Compute the hash value for given not null partition key values.
*/
static uint64
-compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
+compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull)
{
int i;
- int nkeys = key->partnatts;
uint64 rowHash = 0;
Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
- for (i = 0; i < nkeys; i++)
+ for (i = 0; i < partnatts; i++)
{
if (!isnull[i])
{
Datum hash;
- Assert(OidIsValid(key->partsupfunc[i].fn_oid));
+ Assert(OidIsValid(partsupfunc[i].fn_oid));
/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
- hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
+ hash = FunctionCall2(&partsupfunc[i], values[i], seed);
/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
--
2.11.0
v26-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchtext/plain; charset=UTF-8; name=v26-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchDownload
From ef5c820b18874649881c90d28c7a747c7daa8daf Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v26 3/5] Add parttypid, partcollation, partsupfunc to
PartitionScheme
---
src/backend/optimizer/util/plancat.c | 41 +++++++++++++++++++++++++-----------
src/include/nodes/relation.h | 9 ++++++++
2 files changed, 38 insertions(+), 12 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..dcfc1665a8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1887,22 +1887,26 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
continue;
/* Match the partition key types. */
- if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
+ if (memcmp(partkey->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
/*
- * Length and byval information should match when partopcintype
+ * typlen, typbyval, typcoll information should match when typid
* matches.
*/
Assert(memcmp(partkey->parttyplen, part_scheme->parttyplen,
sizeof(int16) * partnatts) == 0);
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ Assert(memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(bool) * partnatts) == 0);
/* Found matching partition scheme. */
return part_scheme;
@@ -1918,6 +1922,22 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
part_scheme->strategy = partkey->strategy;
part_scheme->partnatts = partkey->partnatts;
+ part_scheme->parttypid = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypid, partkey->parttypid,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
+ memcpy(part_scheme->parttyplen, partkey->parttyplen,
+ sizeof(int16) * partnatts);
+
+ part_scheme->parttypbyval = (bool *) palloc(sizeof(bool) * partnatts);
+ memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
+ sizeof(bool) * partnatts);
+
+ part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ sizeof(Oid) * partnatts);
+
part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
memcpy(part_scheme->partopfamily, partkey->partopfamily,
sizeof(Oid) * partnatts);
@@ -1926,17 +1946,14 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->partopcintype, partkey->partopcintype,
sizeof(Oid) * partnatts);
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
sizeof(Oid) * partnatts);
- part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
- memcpy(part_scheme->parttyplen, partkey->parttyplen,
- sizeof(int16) * partnatts);
-
- part_scheme->parttypbyval = (bool *) palloc(sizeof(bool) * partnatts);
- memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
- sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..ce9975c620 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -342,6 +343,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -350,10 +354,15 @@ typedef struct PartitionSchemeData
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
+ Oid *parttypid;
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Array of partition key comparison function pointers */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v26-0004-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v26-0004-Faster-partition-pruning.patchDownload
From 7eefca109fdc0088ca2deaf7d0240a44990addf2 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v26 4/5] Faster partition pruning
Authors: Amit Langote,
Dilip Kumar (dilipbalaut@gmail.com),
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/catalog/partition.c | 669 ++++++++++++
src/backend/nodes/copyfuncs.c | 22 +
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1439 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 85 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/primnodes.h | 40 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 430 +++++++-
src/test/regress/sql/partition_prune.sql | 77 +-
18 files changed, 2804 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dccaa232a9..4648c2c92f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -196,6 +196,15 @@ static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1563,9 +1572,669 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of partitions that can safely be removed due to
+ * 'ne_clauses' containing not-equal clauses for all possible values
+ * that the partition can contain.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /*
+ * If the query is looking for null keys, there can only be one such
+ * partition. Return the same if one exists.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * The bound at minoff is <= minkeys, given the way
+ * partition_bound_bsearch() works. If it's not equal (<), then
+ * increment minoff to make it point to the datum on the right
+ * that necessarily satisfies minkeys. Also do the same if it is
+ * equal but minkeys is exclusive.
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * minkeys is greater than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * The bound at maxoff is <= maxkeys, given the way
+ * partition_bound_bsearch works. If the bound at maxoff exactly
+ * matches maxkey (is_equal), but the maxkey is exclusive, then
+ * decrement maxoff to point to the bound on the left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal,
+ include_def = false;
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering it as the lower bound
+ * of the partition that eqkeys falls into, the bound at eqoff + 1
+ * would be its upper bound, so use eqoff + 1 to get the desired
+ * partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_bound_bsearch would've returned the offset of just one of
+ * those. If minkey is inclusive, we must decrement minoff until it
+ * reaches the leftmost of those bound values, so that partitions
+ * corresponding to all those bound values are selected. If minkeys
+ * is exclusive, we must increment minoff until it reaches the first
+ * bound greater than this prefix, so that none of the partitions
+ * corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff += 1;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, we believe that minoff/maxoff point to the upper bound
+ * of some partition, but it may not be the case. It might actually be
+ * the upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range us unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partnatts || keys->n_maxkeys < partnatts)
+ {
+ for (i = 0; i < partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of indexes of partitions that can safely be removed
+ * due to each such partition's every allowable non-null datum appearing in
+ * a <> opeartor clause.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 82255b0d1d..a3048e46ef 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2132,6 +2132,25 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+static PartitionClauseInfo *
+_copyPartitionClauseInfo(const PartitionClauseInfo *from)
+{
+ PartitionClauseInfo *newnode = makeNode(PartitionClauseInfo);
+
+ int i;
+ for (i = 0; i < PARTITION_MAX_KEYS; i++)
+ COPY_NODE_FIELD(keyclauses[i]);
+
+ COPY_NODE_FIELD(or_clauses);
+ COPY_NODE_FIELD(ne_clauses);
+ COPY_BITMAPSET_FIELD(keyisnull);
+ COPY_BITMAPSET_FIELD(keyisnotnull);
+ COPY_SCALAR_FIELD(constfalse);
+ COPY_SCALAR_FIELD(foundkeyclauses);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5020,6 +5039,9 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionClauseInfo:
+ retval = _copyPartitionClauseInfo(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6e842f93d0..98d7a19dad 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Bitmapset *live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..71a7b7b0a2
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1439 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Followig entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * effort only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. Caller must have set up a
+ * valid partition pruning context in the form of struct PartitionPruneContext,
+ * that is, each of its fields other other than clauseinfo must be valid before
+ * calling here. After extracting relevant clauses, clauseinfo is filled with
+ * information that will be used for actual pruning.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on relevant partitioning
+ * clauses. Caller must have called generate_partition_clauses() at least
+ * once and hence a valid partition pruning context must have already been
+ * created. Especially, PartitionPruneContext.clauseinfo must contain valid
+ * information. Partition pruning proceeds by extracting constant values
+ * from the clauses and comparing it with the partition bounds while also
+ * taking into account strategies of the operators in the matched clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static void extract_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns a Bitmapset of the RT indexes of relations belonging to the
+ * minimum set of partitions which must be scanned to satisfy rel's
+ * baserestrictinfo quals.
+ */
+Bitmapset *
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Bitmapset *result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+
+ context.partkeys = (Expr **) palloc0(sizeof(Expr *) *
+ context.partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.parttypid = rel->part_scheme->parttypid;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses; context.clauseinfo will be set */
+ generate_partition_clauses(&context, clauses);
+
+ if (!context.clauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes = get_partitions_from_clauses(&context);
+
+ /* Add selected partitions' RT indexes to result. */
+ while ((i = bms_first_member(partindexes)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Analyzes clauses to find those that match the partition key and sets
+ * context->clauseinfo
+ *
+ * Ideally, this should be called only once for a given query and a given
+ * partitioned table.
+ */
+void
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* And away we go to do the real work; context->clauseinfo will be set */
+ extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its slot in minimalclauses
+ * with the most restrictive of the clauses from the corresponding
+ * list in context->clauseinfo.
+ */
+ remove_redundant_clauses(context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in context->clauseinfo. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ */
+static void
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ context->clauseinfo = partclauseinfo = makeNode(PartitionClauseInfo);
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ continue;
+ }
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ continue;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering operators like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber &&
+ context->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ continue;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle it if its negator is an equality operator that
+ * is in turn part of the operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ continue;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+ /*
+ * Boolean clauses have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ continue;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!equal(leftop, partkey))
+ continue;
+
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!equal(leftop, partkey))
+ continue;
+
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+}
+
+/*
+ * get_partitions_from_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ /*
+ * All fields except clauseinfo are same as in the parent context,
+ * which will be set by calling extract_partition_clauses().
+ */
+ memcpy(&subcontext, context, sizeof(PartitionPruneContext));
+ extract_partition_clauses(&subcontext, clauses);
+
+ if (!subcontext.clauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!subcontext.clauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(&subcontext);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets a field
+ * in context->clauseinfo to inform the caller that we found such clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(parttypid, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(parttypid, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ if (clauseinfo->ne_clauses)
+ {
+ keys->ne_datums = (Datum *)
+ palloc0(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->parttypid[0], pc->value,
+ &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != parttypid)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ parttypid, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index dcfc1665a8..f3063be6d9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
+
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
- result = list_concat(result, pcqual);
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 5c368321e6..5b5be8fe16 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..0dd6bd3020 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,87 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *parttypid;
+ Oid *partopfamily;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+
+ /* Information about matched clauses */
+ PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Properties found are cached and are indexed by the
+ * partition key index.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses found for the corresponding partition
+ * are inclusive of the stored value or not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /* Datum values from clauses containing <> operator */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +154,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..0ac242aeda 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -190,6 +190,7 @@ typedef enum NodeTag
T_JoinExpr,
T_FromExpr,
T_OnConflictExpr,
+ T_PartitionClauseInfo,
T_IntoClause,
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..6cfb876218 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,44 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionClauseInfo
+ *
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set.
+ *----------
+ */
+typedef struct PartitionClauseInfo
+{
+ NodeTag type;
+
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ce9975c620..5ee23a5bb5 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -538,6 +538,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -666,6 +668,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..5c0d469600
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Bitmapset *prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern void generate_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..bc9ff38253 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,355 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..b7c5abf378 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,79 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
v26-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v26-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From adf8d323ca5840ba8a13802d75bb8d62499ce53d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v26 5/5] Add only unpruned partitioned child rels to
partitioned_rels
---
src/backend/optimizer/path/allpaths.c | 69 ++++++++++++++++-------------------
src/backend/optimizer/plan/planner.c | 19 +++++++---
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/relation.h | 8 ++++
4 files changed, 56 insertions(+), 43 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 98d7a19dad..0adcfad958 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,7 +878,10 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ rel->live_partitioned_rels = list_make1_int(rti);
+ }
/*
* Initialize to compute size estimates for whole append relation.
@@ -1358,44 +1361,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
List *partitioned_rels = NIL;
RangeTblEntry *rte;
- bool build_partitioned_rels = false;
double partial_rows = -1;
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel. We can find them in rel->live_partitioned_rels. However,
+ * it contains only the immediate children, so collect those of the
+ * children that are partitioned themselves in loop below and concatenate
+ * all into one list to be passed to the path creation function.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), whose child sub-
+ * queries may contain references to partitioned tables. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
if (IS_SIMPLE_REL(rel))
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
- {
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
- }
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_PARTITIONED_TABLE)
+ partitioned_rels = rel->live_partitioned_rels;
}
else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
{
/*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
+ * For joinrel consisting of partitioned tables, construct the list
+ * list by combining live_partitioned_rels of the component
+ * partitioned tables, which is what the following does.
*/
partitioned_rels = get_partitioned_child_rels_for_join(root,
rel->relids);
@@ -1413,17 +1411,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Path *cheapest_partial_path = NULL;
/*
- * If we need to build partitioned_rels, accumulate the partitioned
- * rels for this child.
+ * Accumulate the live partitioned children of this child, if it's
+ * itself partitioned rel.
*/
- if (build_partitioned_rels)
- {
- List *cprels;
-
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
+ if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
- list_copy(cprels));
- }
+ list_copy(childrel->live_partitioned_rels));
/*
* If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447c..cebe7a54ad 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5976,14 +5976,23 @@ List *
get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
{
List *result = NIL;
- ListCell *l;
+ int relid;
- foreach(l, root->pcinfo_list)
+ relid = -1;
+ while ((relid = bms_next_member(join_relids, relid)) >= 0)
{
- PartitionedChildRelInfo *pc = lfirst(l);
+ RelOptInfo *rel;
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
+ /* Paranoia: ignore bogus relid indexes */
+ if (relid >= root->simple_rel_array_size)
+ continue;
+ rel = root->simple_rel_array[relid];
+ if (rel == NULL)
+ continue;
+ Assert(rel->relid == relid); /* sanity check on array */
+ Assert(rel->part_scheme != NULL);
+ Assert(list_length(rel->live_partitioned_rels) >= 1);
+ result = list_concat(result, list_copy(rel->live_partitioned_rels));
}
return result;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 5b5be8fe16..ad40ac7f8b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->live_partitioned_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_partitioned_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->live_partitioned_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ee23a5bb5..6454954e3b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -542,6 +542,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * live_partitioned_rels - RT indexes of unpruned partitions that are
+ * partitioned tables themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -674,6 +676,12 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+
+ /*
+ * RT indexes of live partitions that are partitioned tables themselves.
+ * This includes the RT index of the table itself.
+ */
+ List *live_partitioned_rels;
} RelOptInfo;
/*
--
2.11.0
Thanks for working on this. May I suggest to open a completely new
thread?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Feb 13, 2018 at 2:17 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Agree with the proposed reorganizing and adding a partcache.c, which I
tried to do in the attached patch.* The new src/backend/utils/cache/partcache.c contains functions that
initialize relcache's partitioning related fields. Various partition
bound comparison and search functions (and then some) that work off of the
cached information are moved.
Are you moving partition bound comparison functions to partcache.c?
They will also used by optimizer, so may be leave them out of
partcache.c?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
David Rowley wrote:
On 19 January 2018 at 16:00, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:And I'd like to ask David to check out his mail environment so
that SPF record is available for his message.Will investigate
This should be fixed now. Please let us know if you still see problems.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018/02/13 22:23, Alvaro Herrera wrote:
Thanks for working on this. May I suggest to open a completely new
thread?
Done.
Thanks,
Amit
On 2018/02/13 20:08, Amit Langote wrote:
On 2018/02/09 21:36, Amit Langote wrote:
0004-Faster-partition-pruning.patch
The main patch that adds src/backend/optimizer/util/partprune.c, a module
to provide the functionality that will replace the current approach of
calling relation_excluded_by_constraints() for each partition.Sorry, but there is still this big TODO here, which I'll try to fix early
next week.+ * partprune.c + * Provides functions to prune partitions of a partitioned table by + * comparing provided set of clauses with the table's partitions' + * boundaries + * + * TODO: write a longer description of things in this fileAnd I tried to fix that to some degree in the attached updated version.
Here is an updated version.
I realized that 0005 (Add only unpruned partitioned child rels to
partitioned_rels) did that only for (Merge)Append. That is, it didn't
handle ModifyTable. I fixed that by teaching inheritance_planner() to do
it. In the process, I found out that we don't need the
PartitionedChildRelInfo node and related code anymore, so the patch ended
up removing more code than adding it.
Thanks,
Amit
Attachments:
v27-0001-Modify-bound-comparision-functions-to-accept-mem.patchtext/plain; charset=UTF-8; name=v27-0001-Modify-bound-comparision-functions-to-accept-mem.patchDownload
From b2d3d25619bf75137d5aae3699aababd09b6b591 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 6 Jul 2017 14:15:22 +0530
Subject: [PATCH v27 1/5] Modify bound comparision functions to accept members
of PartitionKey
Functions partition_rbound_cmp() and partition_rbound_datum_cmp() are
required to merge partition bounds from joining relations. While doing
so, we do not have access to the PartitionKey of either relations. So,
modify these functions to accept only required members of PartitionKey
so that the functions can be reused for merging bounds.
Ashutosh Bapat.
---
src/backend/catalog/partition.c | 53 ++++++++++++++++++++++++++++-------------
1 file changed, 36 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 31c80c7f1a..2a64757584 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -165,10 +165,12 @@ static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
List *datums, bool lower);
static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
int remainder2);
-static int32 partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation, Datum *datums1,
+ PartitionRangeDatumKind *kind1, bool lower1,
+ PartitionRangeBound *b2);
+static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
@@ -1116,8 +1118,9 @@ check_new_partition_bound(char *relname, Relation parent,
* First check if the resulting range would be empty with
* specified lower and upper bounds
*/
- if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
- upper) >= 0)
+ if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, lower->datums,
+ lower->kind, true, upper) >= 0)
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -1177,7 +1180,10 @@ check_new_partition_bound(char *relname, Relation parent,
kind = boundinfo->kind[offset + 1];
is_lower = (boundinfo->indexes[offset + 1] == -1);
- cmpval = partition_rbound_cmp(key, datums, kind,
+ cmpval = partition_rbound_cmp(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ datums, kind,
is_lower, upper);
if (cmpval < 0)
{
@@ -2814,7 +2820,9 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
PartitionKey key = (PartitionKey) arg;
- return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+ return partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, b1->datums, b1->kind,
+ b1->lower, b2);
}
/*
@@ -2823,6 +2831,10 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* Return for two range bounds whether the 1st one (specified in datums1,
* kind1, and lower1) is <, =, or > the bound specified in *b2.
*
+ * partnatts, partsupfunc and partcollation give number of attributes in the
+ * bounds to be compared, comparison function to be used and the collations of
+ * attributes resp.
+ *
* Note that if the values of the two range bounds compare equal, then we take
* into account whether they are upper or lower bounds, and an upper bound is
* considered to be smaller than a lower bound. This is important to the way
@@ -2831,7 +2843,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* two contiguous partitions.
*/
static int32
-partition_rbound_cmp(PartitionKey key,
+partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
Datum *datums1, PartitionRangeDatumKind *kind1,
bool lower1, PartitionRangeBound *b2)
{
@@ -2841,7 +2853,7 @@ partition_rbound_cmp(PartitionKey key,
PartitionRangeDatumKind *kind2 = b2->kind;
bool lower2 = b2->lower;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < partnatts; i++)
{
/*
* First, handle cases where the column is unbounded, which should not
@@ -2862,8 +2874,8 @@ partition_rbound_cmp(PartitionKey key,
*/
break;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
datums1[i],
datums2[i]));
if (cmpval != 0)
@@ -2887,9 +2899,14 @@ partition_rbound_cmp(PartitionKey key,
*
* Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
* is <, =, or > partition key of tuple (tuple_datums)
+ *
+ * n_tuple_datums, partsupfunc and partcollation give number of attributes in
+ * the bounds to be compared, comparison function to be used and the collations
+ * of attributes resp.
+ *
*/
static int32
-partition_rbound_datum_cmp(PartitionKey key,
+partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
{
@@ -2903,8 +2920,8 @@ partition_rbound_datum_cmp(PartitionKey key,
else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
return 1;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
rb_datums[i],
tuple_datums[i]));
if (cmpval != 0)
@@ -2981,7 +2998,8 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key,
+ cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3025,7 +3043,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key,
+ cmpval = partition_rbound_datum_cmp(key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
--
2.11.0
v27-0002-Refactor-partition-bound-search-functions.patchtext/plain; charset=UTF-8; name=v27-0002-Refactor-partition-bound-search-functions.patchDownload
From 7bab3e70369733e325c7340a516fa3c50f0412b0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 8 Feb 2018 19:08:12 +0900
Subject: [PATCH v27 2/5] Refactor partition bound search functions
Remove the PartitionKey argument from their signature and instead
add provide the necessary information through other arguments.
---
src/backend/catalog/partition.c | 75 +++++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 2a64757584..dccaa232a9 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -174,22 +174,24 @@ static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
-static int partition_list_bsearch(PartitionKey key,
+static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal);
-static int partition_range_bsearch(PartitionKey key,
+static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(PartitionKey key,
+static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
+static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
@@ -1007,7 +1009,7 @@ check_new_partition_bound(char *relname, Relation parent,
* boundinfo->datums that is less than or equal to the
* (spec->modulus, spec->remainder) pair.
*/
- offset = partition_hash_bsearch(key, boundinfo,
+ offset = partition_hash_bsearch(boundinfo,
spec->modulus,
spec->remainder);
if (offset < 0)
@@ -1083,7 +1085,9 @@ check_new_partition_bound(char *relname, Relation parent,
int offset;
bool equal;
- offset = partition_list_bsearch(key, boundinfo,
+ offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
+ boundinfo,
val->constvalue,
&equal);
if (offset >= 0 && equal)
@@ -1158,7 +1162,10 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_range_bsearch(key, boundinfo, lower,
+ offset = partition_range_bsearch(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ boundinfo, lower,
&equal);
if (boundinfo->indexes[offset + 1] < 0)
@@ -2577,7 +2584,9 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int greatest_modulus = get_greatest_modulus(boundinfo);
- uint64 rowHash = compute_hash_value(key, values, isnull);
+ uint64 rowHash = compute_hash_value(key->partnatts,
+ key->partsupfunc,
+ values, isnull);
part_index = boundinfo->indexes[rowHash % greatest_modulus];
}
@@ -2593,7 +2602,8 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
bool equal = false;
- bound_offset = partition_list_bsearch(key,
+ bound_offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
partdesc->boundinfo,
values[0], &equal);
if (bound_offset >= 0 && equal)
@@ -2622,11 +2632,13 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
- bound_offset = partition_range_datum_bsearch(key,
- partdesc->boundinfo,
- key->partnatts,
- values,
- &equal);
+ bound_offset =
+ partition_range_datum_bsearch(key->partsupfunc,
+ key->partcollation,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
/*
* The bound at bound_offset is less than or equal to the
* tuple value, so the bound at offset+1 is the upper
@@ -2940,7 +2952,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* to the input value.
*/
static int
-partition_list_bsearch(PartitionKey key,
+partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
{
@@ -2955,8 +2967,8 @@ partition_list_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+ partcollation[0],
boundinfo->datums[mid][0],
value));
if (cmpval <= 0)
@@ -2983,7 +2995,8 @@ partition_list_bsearch(PartitionKey key,
* to the input range bound
*/
static int
-partition_range_bsearch(PartitionKey key,
+partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal)
{
@@ -2998,8 +3011,7 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_cmp(partnatts, partsupfunc, partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3028,7 +3040,7 @@ partition_range_bsearch(PartitionKey key,
* to the input tuple.
*/
static int
-partition_range_datum_bsearch(PartitionKey key,
+partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
{
@@ -3043,8 +3055,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
@@ -3071,8 +3083,7 @@ partition_range_datum_bsearch(PartitionKey key,
* all of them are greater
*/
static int
-partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
int lo,
@@ -3270,27 +3281,27 @@ get_greatest_modulus(PartitionBoundInfo bound)
* Compute the hash value for given not null partition key values.
*/
static uint64
-compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
+compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull)
{
int i;
- int nkeys = key->partnatts;
uint64 rowHash = 0;
Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
- for (i = 0; i < nkeys; i++)
+ for (i = 0; i < partnatts; i++)
{
if (!isnull[i])
{
Datum hash;
- Assert(OidIsValid(key->partsupfunc[i].fn_oid));
+ Assert(OidIsValid(partsupfunc[i].fn_oid));
/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
- hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
+ hash = FunctionCall2(&partsupfunc[i], values[i], seed);
/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
--
2.11.0
v27-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchtext/plain; charset=UTF-8; name=v27-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchDownload
From 9064e7ea782d69e3837b36765653b4b20f5d086b Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v27 3/5] Add parttypid, partcollation, partsupfunc to
PartitionScheme
---
src/backend/optimizer/util/plancat.c | 43 +++++++++++++++++++++++++-----------
src/include/nodes/relation.h | 9 ++++++++
2 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..dcfc1665a8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1887,22 +1887,26 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
continue;
/* Match the partition key types. */
- if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
+ if (memcmp(partkey->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
/*
- * Length and byval information should match when partopcintype
+ * typlen, typbyval, typcoll information should match when typid
* matches.
*/
Assert(memcmp(partkey->parttyplen, part_scheme->parttyplen,
sizeof(int16) * partnatts) == 0);
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ Assert(memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(bool) * partnatts) == 0);
/* Found matching partition scheme. */
return part_scheme;
@@ -1918,16 +1922,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
part_scheme->strategy = partkey->strategy;
part_scheme->partnatts = partkey->partnatts;
- part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopfamily, partkey->partopfamily,
- sizeof(Oid) * partnatts);
-
- part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopcintype, partkey->partopcintype,
- sizeof(Oid) * partnatts);
-
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->parttypid = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypid, partkey->parttypid,
sizeof(Oid) * partnatts);
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
@@ -1938,6 +1934,27 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopfamily, partkey->partopfamily,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopcintype, partkey->partopcintype,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..ce9975c620 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -342,6 +343,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -350,10 +354,15 @@ typedef struct PartitionSchemeData
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
+ Oid *parttypid;
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Array of partition key comparison function pointers */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v27-0004-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v27-0004-Faster-partition-pruning.patchDownload
From 87b338584ae97c24074717f38f615a2b2fbdef43 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v27 4/5] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
Dilip Kumar (dilipbalaut@gmail.com),
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/catalog/partition.c | 669 ++++++++++++
src/backend/nodes/copyfuncs.c | 22 +
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1439 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 85 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/primnodes.h | 40 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 430 +++++++-
src/test/regress/sql/partition_prune.sql | 77 +-
18 files changed, 2804 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dccaa232a9..dc061adca0 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -196,6 +196,15 @@ static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
/* SQL-callable function for use in hash partition CHECK constraints */
PG_FUNCTION_INFO_V1(satisfies_hash_partition);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1563,9 +1572,669 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of partitions excluded due to each of
+ * those partitions' *all* of allowed datums appearing in
+ * keys->ne_datums, that is compared to the partition key
+ * using <> operator.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ for (i = 0; i < partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /*
+ * If the query is looking for null keys, there can only be one such
+ * partition. Return the same if one exists.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * The bound at minoff is <= minkeys, given the way
+ * partition_bound_bsearch() works. If it's not equal (<), then
+ * increment minoff to make it point to the datum on the right
+ * that necessarily satisfies minkeys. Also do the same if it is
+ * equal but minkeys is exclusive.
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * minkeys is greater than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * The bound at maxoff is <= maxkeys, given the way
+ * partition_bound_bsearch works. If the bound at maxoff exactly
+ * matches maxkey (is_equal), but the maxkey is exclusive, then
+ * decrement maxoff to point to the bound on the left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal,
+ include_def = false;
+
+ /*
+ * We might be able to get the answer sooner based on the nullness of
+ * keys, so get that out of the way.
+ */
+ for (i = 0; i < partnatts; i++)
+ {
+ if (bms_is_member(i, keys->keyisnull))
+ {
+ /* Only the default partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_bound_bsearch works. Considering it as the lower bound
+ * of the partition that eqkeys falls into, the bound at eqoff + 1
+ * would be its upper bound, so use eqoff + 1 to get the desired
+ * partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_bound_bsearch would've returned the offset of just one of
+ * those. If minkey is inclusive, we must decrement minoff until it
+ * reaches the leftmost of those bound values, so that partitions
+ * corresponding to all those bound values are selected. If minkeys
+ * is exclusive, we must increment minoff until it reaches the first
+ * bound greater than this prefix, so that none of the partitions
+ * corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff += 1;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, we believe that minoff/maxoff point to the upper bound
+ * of some partition, but it may not be the case. It might actually be
+ * the upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range us unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (keys->n_minkeys < partnatts || keys->n_maxkeys < partnatts)
+ {
+ for (i = 0; i < partnatts; i++)
+ {
+ if (!bms_is_member(i, keys->keyisnotnull))
+ {
+ include_def = true;
+ break;
+ }
+ }
+ }
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of indexes of partitions that can safely be removed
+ * due to each such partition's every allowable non-null datum appearing in
+ * a <> opeartor clause.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 82255b0d1d..a3048e46ef 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2132,6 +2132,25 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+static PartitionClauseInfo *
+_copyPartitionClauseInfo(const PartitionClauseInfo *from)
+{
+ PartitionClauseInfo *newnode = makeNode(PartitionClauseInfo);
+
+ int i;
+ for (i = 0; i < PARTITION_MAX_KEYS; i++)
+ COPY_NODE_FIELD(keyclauses[i]);
+
+ COPY_NODE_FIELD(or_clauses);
+ COPY_NODE_FIELD(ne_clauses);
+ COPY_BITMAPSET_FIELD(keyisnull);
+ COPY_BITMAPSET_FIELD(keyisnotnull);
+ COPY_SCALAR_FIELD(constfalse);
+ COPY_SCALAR_FIELD(foundkeyclauses);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5020,6 +5039,9 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionClauseInfo:
+ retval = _copyPartitionClauseInfo(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6e842f93d0..98d7a19dad 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Bitmapset *live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..71a7b7b0a2
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1439 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Followig entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * effort only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. Caller must have set up a
+ * valid partition pruning context in the form of struct PartitionPruneContext,
+ * that is, each of its fields other other than clauseinfo must be valid before
+ * calling here. After extracting relevant clauses, clauseinfo is filled with
+ * information that will be used for actual pruning.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on relevant partitioning
+ * clauses. Caller must have called generate_partition_clauses() at least
+ * once and hence a valid partition pruning context must have already been
+ * created. Especially, PartitionPruneContext.clauseinfo must contain valid
+ * information. Partition pruning proceeds by extracting constant values
+ * from the clauses and comparing it with the partition bounds while also
+ * taking into account strategies of the operators in the matched clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static void extract_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns a Bitmapset of the RT indexes of relations belonging to the
+ * minimum set of partitions which must be scanned to satisfy rel's
+ * baserestrictinfo quals.
+ */
+Bitmapset *
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Bitmapset *result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+
+ context.partkeys = (Expr **) palloc0(sizeof(Expr *) *
+ context.partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.parttypid = rel->part_scheme->parttypid;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses; context.clauseinfo will be set */
+ generate_partition_clauses(&context, clauses);
+
+ if (!context.clauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes = get_partitions_from_clauses(&context);
+
+ /* Add selected partitions' RT indexes to result. */
+ while ((i = bms_first_member(partindexes)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Analyzes clauses to find those that match the partition key and sets
+ * context->clauseinfo
+ *
+ * Ideally, this should be called only once for a given query and a given
+ * partitioned table.
+ */
+void
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* And away we go to do the real work; context->clauseinfo will be set */
+ extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its slot in minimalclauses
+ * with the most restrictive of the clauses from the corresponding
+ * list in context->clauseinfo.
+ */
+ remove_redundant_clauses(context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in context->clauseinfo. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ */
+static void
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ context->clauseinfo = partclauseinfo = makeNode(PartitionClauseInfo);
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ if (IsA(clause, OpExpr))
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ continue;
+ }
+ else
+ /* Clause does not match this partition key. */
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ continue;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ continue;
+
+ /*
+ * Handle cases where the clause's operator does not belong to
+ * the partitioning operator family. We currently handle two
+ * such cases: 1. Operators named '<>' are not listed in any
+ * operator family whatsoever, 2. Ordering operators like '<'
+ * are not listed in the hash operator families. For 1, check
+ * if list partitioning is in use and if so, proceed to pass
+ * the clause to the caller without doing any more processing
+ * ourselves. 2 cannot be handled at all, so the clause is
+ * simply skipped.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a equality operator. If it's a btree
+ * equality operator *and* this is a list partitioned
+ * table, we can use it prune partitions.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber &&
+ context->strategy == PARTITION_STRATEGY_LIST)
+ is_ne_listp = true;
+ }
+
+ /* Cannot handle this clause. */
+ if (!is_ne_listp)
+ continue;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ continue;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ continue;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which while not
+ * listed as part of any operator family, we are able to
+ * handle it if its negator is an equality operator that
+ * is in turn part of the operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator = get_negator(saop_op);
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ if (!OidIsValid(negator))
+ continue;
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ continue;
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ continue;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+ /*
+ * Boolean clauses have a special shape, which would've been
+ * accepted if the partitioning opfamily accepts Boolean
+ * conditions.
+ */
+ else if (IsBooleanOpfamily(partopfamily) &&
+ (IsA(clause, BooleanTest) ||
+ IsA(clause, Var) ||
+ not_clause((Node *) clause)))
+ {
+ Expr *leftop,
+ *rightop;
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ continue;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!equal(leftop, partkey))
+ continue;
+
+ rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+ else
+ {
+ leftop = IsA(clause, Var)
+ ? (Expr *) clause
+ : (Expr *) get_notclausearg((Expr *) clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (!equal(leftop, partkey))
+ continue;
+
+ rightop = IsA(clause, Var)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+}
+
+/*
+ * get_partitions_from_args
+ *
+ * Returns the set of partitions of relation, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ /*
+ * All fields except clauseinfo are same as in the parent context,
+ * which will be set by calling extract_partition_clauses().
+ */
+ memcpy(&subcontext, context, sizeof(PartitionPruneContext));
+ extract_partition_clauses(&subcontext, clauses);
+
+ if (!subcontext.clauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!subcontext.clauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(&subcontext);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets a field
+ * in context->clauseinfo to inform the caller that we found such clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(parttypid, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(parttypid, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clausses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ if (clauseinfo->ne_clauses)
+ {
+ keys->ne_datums = (Datum *)
+ palloc0(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->parttypid[0], pc->value,
+ &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != parttypid)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ parttypid, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index dcfc1665a8..f3063be6d9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 5c368321e6..5b5be8fe16 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..0dd6bd3020 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,87 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *parttypid;
+ Oid *partopfamily;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+
+ /* Information about matched clauses */
+ PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Properties found are cached and are indexed by the
+ * partition key index.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses found for the corresponding partition
+ * are inclusive of the stored value or not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /* Datum values from clauses containing <> operator */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +154,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..0ac242aeda 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -190,6 +190,7 @@ typedef enum NodeTag
T_JoinExpr,
T_FromExpr,
T_OnConflictExpr,
+ T_PartitionClauseInfo,
T_IntoClause,
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..6cfb876218 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,44 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionClauseInfo
+ *
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set.
+ *----------
+ */
+typedef struct PartitionClauseInfo
+{
+ NodeTag type;
+
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ce9975c620..5ee23a5bb5 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -538,6 +538,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -666,6 +668,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..5c0d469600
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Bitmapset *prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern void generate_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..bc9ff38253 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,355 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..b7c5abf378 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,79 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
v27-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v27-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From d65f092fb82a890b3e95dba111290d31ecee6dd3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v27 5/5] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 106 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index a3048e46ef..d844b8bf92 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2279,21 +2279,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5061,9 +5046,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9bc8e38d7..cf381573e9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3184,9 +3174,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 011d2a3fa9..fe309a6b54 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4072,9 +4063,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 98d7a19dad..86063502e0 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447c..646d118a5f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -559,7 +559,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -574,6 +573,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1116,12 +1116,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Bitmapset *partitioned_rels_bms = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1193,10 +1193,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_rels_bms = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1380,7 +1382,6 @@ inheritance_planner(PlannerInfo *root)
parent_relids =
bms_add_member(parent_relids, appinfo->child_relid);
parent_roots[appinfo->child_relid] = subroot;
-
continue;
}
@@ -1427,6 +1428,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_rels_bms = bms_add_member(partitioned_rels_bms,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1527,6 +1532,20 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_rels_bms)
+ {
+ int parent_rti;
+
+ while ((parent_rti = bms_first_member(partitioned_rels_bms)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, parent_rti);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1534,7 +1553,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5931,65 +5950,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 5b5be8fe16..5743cbb8f6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 0ac242aeda..81d223c0db 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,7 +261,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ee23a5bb5..5579940d98 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -252,8 +252,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -318,6 +316,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -542,6 +543,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -674,6 +678,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2126,27 +2131,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
On 2 February 2018 at 23:03, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
2. PartitionClauseInfo->keyclauses is a list of PartClause which is
not a node type. This will cause _copyPartitionClauseInfo() to fail.I'm still not quite sure the best way to fix #2 since PartClause
contains a FmgrInfo. I do have a local fix which moves PartClause to
primnodes.h and makes it a proper node type. I also added a copy
function which does not copy any of the cache fields in PartClause. It
just sets valid_cache to false. I didn't particularly think this was
the correct fix. I just couldn't think of how exactly this should be
done at the time.The attached patch also adds the missing nodetag from
PartitionClauseInfo and also fixes up other code so as we don't memset
the node memory to zero, as that would destroy the tag. I ended up
just having extract_partition_key_clauses do the makeNode call. This
also resulted in populate_partition_clauses being renamed to
generate_partition_clausesI started wondering if it's not such a good idea to make
PartitionClauseInfo a Node at all? I went back to your earlier message
[1] where you said that it's put into the Append node for run-time pruning
to use, but it doesn't sound nice that we'd be putting into the plan
something that's looks more like scratchpad for the partition.c code. I
think we should try to keep PartitionClauseInfo in partition.h and put
only the list of matched bare clauses into Append.
That sounds like a good idea.
A patch which puts this back is attached.
I've changed the run-time prune patch to process the clause lists
during execution instead.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
make_PartitionClauseInfo_a_nonnode_type.patchapplication/octet-stream; name=make_PartitionClauseInfo_a_nonnode_type.patchDownload
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d844b8b..1bb76dd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2132,25 +2132,6 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
-static PartitionClauseInfo *
-_copyPartitionClauseInfo(const PartitionClauseInfo *from)
-{
- PartitionClauseInfo *newnode = makeNode(PartitionClauseInfo);
-
- int i;
- for (i = 0; i < PARTITION_MAX_KEYS; i++)
- COPY_NODE_FIELD(keyclauses[i]);
-
- COPY_NODE_FIELD(or_clauses);
- COPY_NODE_FIELD(ne_clauses);
- COPY_BITMAPSET_FIELD(keyisnull);
- COPY_BITMAPSET_FIELD(keyisnotnull);
- COPY_SCALAR_FIELD(constfalse);
- COPY_SCALAR_FIELD(foundkeyclauses);
-
- return newnode;
-}
-
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,9 +5005,6 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
- case T_PartitionClauseInfo:
- retval = _copyPartitionClauseInfo(from);
- break;
/*
* RELATION NODES
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 0dd6bd3..26a8df8 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -40,6 +40,40 @@ typedef struct PartitionDescData
PartitionBoundInfo boundinfo; /* collection of partition bounds */
} PartitionDescData;
+/* Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
typedef struct PartitionDescData *PartitionDesc;
typedef struct PartitionPruneContext
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 81d223c..c097da6 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -190,7 +190,6 @@ typedef enum NodeTag
T_JoinExpr,
T_FromExpr,
T_OnConflictExpr,
- T_PartitionClauseInfo,
T_IntoClause,
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 6cfb876..1b4b0d7 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,44 +1506,4 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
-/*----------
- * PartitionClauseInfo
- *
- * Stores clauses which were matched to a partition key.
- *
- * Each matching "operator" clause is stored in the 'keyclauses' list for the
- * partition key that it was matched to, except if the operator is <>, in
- * which case, the clause is added to the 'ne_clauses' list.
- *
- * Boolean OR clauses whose at least one argument clause matches a partition
- * key are added to the 'or_clauses' list.
- *
- * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
- * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set.
- *----------
- */
-typedef struct PartitionClauseInfo
-{
- NodeTag type;
-
- /* Lists of clauses indexed by the partition key */
- List *keyclauses[PARTITION_MAX_KEYS];
-
- /* Each members is a List itself of a given OR clauses's arguments. */
- List *or_clauses;
-
- /* List of clauses containing <> operator. */
- List *ne_clauses;
-
- /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
- Bitmapset *keyisnull;
- Bitmapset *keyisnotnull;
-
- /* True if at least one of above fields contains valid information. */
- bool foundkeyclauses;
-
- /* True if mutually contradictory clauses were found. */
- bool constfalse;
-} PartitionClauseInfo;
-
#endif /* PRIMNODES_H */
On 15 February 2018 at 18:57, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Here is an updated version.
Thanks for sending v27. I've had a quick look over it while I was
working on the run-time prune patch. However, I've not quite managed a
complete pass of this version yet
A couple of things so far:
1. Following loop;
for (i = 0; i < partnatts; i++)
{
if (bms_is_member(i, keys->keyisnull))
{
/* Only the default partition accepts nulls. */
if (partition_bound_has_default(boundinfo))
return bms_make_singleton(boundinfo->default_index);
else
return NULL;
}
}
could become:
if (partition_bound_has_default(boundinfo) &&
!bms_is_empty(keys->keyisnull)
return bms_make_singleton(boundinfo->default_index);
else
return NULL;
2. Is the following form of loop necessary?
for (i = 0; i < partnatts; i++)
{
if (bms_is_member(i, keys->keyisnull))
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
}
Can't this just be:
i = -1;
while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
Perhaps you can just Assert(i < partnatts), if you're worried about that.
Similar code exists in get_partitions_for_keys_range
3. Several comments mention partition_bound_bsearch, but there is now
no such function.
4. "us" should be "is"
* not be any unassigned range to speak of, because the range us unbounded
5. The following code is more complex than it needs to be:
/*
* Since partition keys with nulls are mapped to the default range
* partition, we must include the default partition if some keys
* could be null.
*/
if (keys->n_minkeys < partnatts || keys->n_maxkeys < partnatts)
{
for (i = 0; i < partnatts; i++)
{
if (!bms_is_member(i, keys->keyisnotnull))
{
include_def = true;
break;
}
}
}
Instead of the for loop, couldn't you just write:
include_def = (bms_num_members(keys->keyisnotnull) < partnatts);
6. The following comment is not well written:
* get_partitions_excluded_by_ne_datums
*
* Returns a Bitmapset of indexes of partitions that can safely be removed
* due to each such partition's every allowable non-null datum appearing in
* a <> opeartor clause.
Maybe it would be better to write:
* get_partitions_excluded_by_ne_datums
*
* Returns a Bitmapset of partition indexes that can safely be removed due to
* the discovery of <> clauses for each datum value allowed in the partition.
if not, then "opeartor" needs the spelling fixed.
7. "The following"
* Followig entry points exist to this module.
Are there any other .c files where we comment on all the extern
functions in this way? I don't recall seeing it before.
8. The following may as well just: context.partnatts = partnatts;
context.partnatts = rel->part_scheme->partnatts;
9. Why palloc0? Wouldn't palloc be ok?
context.partkeys = (Expr **) palloc0(sizeof(Expr *) *
context.partnatts);
Also, no need for context.partnatts, just partnatts should be fine.
10. I'd rather see bms_next_member() used here:
/* Add selected partitions' RT indexes to result. */
while ((i = bms_first_member(partindexes)) >= 0)
result = bms_add_member(result, rel->part_rels[i]->relid);
There's not really much use for bms_first_member these days. It can be
slower due to having to traverse the unset lower significant bits each
loop. bms_next_member starts where the previous loop left off.
Will try to review more tomorrow.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 17 February 2018 at 22:24, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 2 February 2018 at 23:03, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
I started wondering if it's not such a good idea to make
PartitionClauseInfo a Node at all?That sounds like a good idea.
A patch which puts this back is attached.
Please find attached an updated patch. The previous one must've got a
bit mangled in a bad merge.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
make_PartitionClauseInfo_a_nonnode_type_v2.patchapplication/octet-stream; name=make_PartitionClauseInfo_a_nonnode_type_v2.patchDownload
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d844b8b..1bb76dd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2132,25 +2132,6 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
-static PartitionClauseInfo *
-_copyPartitionClauseInfo(const PartitionClauseInfo *from)
-{
- PartitionClauseInfo *newnode = makeNode(PartitionClauseInfo);
-
- int i;
- for (i = 0; i < PARTITION_MAX_KEYS; i++)
- COPY_NODE_FIELD(keyclauses[i]);
-
- COPY_NODE_FIELD(or_clauses);
- COPY_NODE_FIELD(ne_clauses);
- COPY_BITMAPSET_FIELD(keyisnull);
- COPY_BITMAPSET_FIELD(keyisnotnull);
- COPY_SCALAR_FIELD(constfalse);
- COPY_SCALAR_FIELD(foundkeyclauses);
-
- return newnode;
-}
-
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,9 +5005,6 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
- case T_PartitionClauseInfo:
- retval = _copyPartitionClauseInfo(from);
- break;
/*
* RELATION NODES
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index 71a7b7b..6e17d65 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -330,7 +330,7 @@ extract_partition_clauses(PartitionPruneContext *context, List *clauses)
PartitionClauseInfo *partclauseinfo;
ListCell *lc;
- context->clauseinfo = partclauseinfo = makeNode(PartitionClauseInfo);
+ context->clauseinfo = partclauseinfo = palloc(sizeof(PartitionClauseInfo));
memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
partclauseinfo->or_clauses = NIL;
partclauseinfo->ne_clauses = NIL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 0dd6bd3..26a8df8 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -40,6 +40,40 @@ typedef struct PartitionDescData
PartitionBoundInfo boundinfo; /* collection of partition bounds */
} PartitionDescData;
+/* Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
typedef struct PartitionDescData *PartitionDesc;
typedef struct PartitionPruneContext
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 81d223c..c097da6 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -190,7 +190,6 @@ typedef enum NodeTag
T_JoinExpr,
T_FromExpr,
T_OnConflictExpr,
- T_PartitionClauseInfo,
T_IntoClause,
/*
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 6cfb876..1b4b0d7 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,44 +1506,4 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
-/*----------
- * PartitionClauseInfo
- *
- * Stores clauses which were matched to a partition key.
- *
- * Each matching "operator" clause is stored in the 'keyclauses' list for the
- * partition key that it was matched to, except if the operator is <>, in
- * which case, the clause is added to the 'ne_clauses' list.
- *
- * Boolean OR clauses whose at least one argument clause matches a partition
- * key are added to the 'or_clauses' list.
- *
- * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
- * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set.
- *----------
- */
-typedef struct PartitionClauseInfo
-{
- NodeTag type;
-
- /* Lists of clauses indexed by the partition key */
- List *keyclauses[PARTITION_MAX_KEYS];
-
- /* Each members is a List itself of a given OR clauses's arguments. */
- List *or_clauses;
-
- /* List of clauses containing <> operator. */
- List *ne_clauses;
-
- /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
- Bitmapset *keyisnull;
- Bitmapset *keyisnotnull;
-
- /* True if at least one of above fields contains valid information. */
- bool foundkeyclauses;
-
- /* True if mutually contradictory clauses were found. */
- bool constfalse;
-} PartitionClauseInfo;
-
#endif /* PRIMNODES_H */
On 17 February 2018 at 22:39, David Rowley
<david.rowley@2ndquadrant.com> wrote:
10. I'd rather see bms_next_member() used here:
/* Add selected partitions' RT indexes to result. */
while ((i = bms_first_member(partindexes)) >= 0)
result = bms_add_member(result, rel->part_rels[i]->relid);There's not really much use for bms_first_member these days. It can be
slower due to having to traverse the unset lower significant bits each
loop. bms_next_member starts where the previous loop left off.Will try to review more tomorrow.
As I mentioned yesterday, here's the remainder of the review:
11. The following comment is misleading. It says: "We currently handle
two such cases:", then it goes on to say the 2nd case is not handled.
/*
* Handle cases where the clause's operator does not belong to
* the partitioning operator family. We currently handle two
* such cases: 1. Operators named '<>' are not listed in any
* operator family whatsoever, 2. Ordering operators like '<'
* are not listed in the hash operator families. For 1, check
* if list partitioning is in use and if so, proceed to pass
* the clause to the caller without doing any more processing
* ourselves. 2 cannot be handled at all, so the clause is
* simply skipped.
*/
12. The following code should test for LIST partitioning before doing
anything else:
if (!op_in_opfamily(opclause->opno, partopfamily))
{
Oid negator;
/*
* To confirm if the operator is really '<>', check if its
* negator is a equality operator. If it's a btree
* equality operator *and* this is a list partitioned
* table, we can use it prune partitions.
*/
negator = get_negator(opclause->opno);
if (OidIsValid(negator) &&
op_in_opfamily(negator, partopfamily))
{
Oid lefttype;
Oid righttype;
int strategy;
get_op_opfamily_properties(negator, partopfamily,
false,
&strategy,
&lefttype, &righttype);
if (strategy == BTEqualStrategyNumber &&
context->strategy == PARTITION_STRATEGY_LIST)
is_ne_listp = true;
}
/* Cannot handle this clause. */
if (!is_ne_listp)
continue;
}
The code should probably be in the form of:
if (!op_in_opfamily(opclause->opno, partopfamily))
{
if (context->strategy != PARTITION_STRATEGY_LIST)
continue;
...
if (strategy == BTEqualStrategyNumber)
is_ne_listp = true;
}
that way we'll save 3 syscache lookups when a <> clause appears in a
RANGE or HASH partitioned table.
13. The following code makes assumptions that the partitioning op
family is btree:
/*
* In case of NOT IN (..), we get a '<>', which while not
* listed as part of any operator family, we are able to
* handle it if its negator is an equality operator that
* is in turn part of the operator family.
*/
if (!op_in_opfamily(saop_op, partopfamily))
{
Oid negator = get_negator(saop_op);
int strategy;
Oid lefttype,
righttype;
if (!OidIsValid(negator))
continue;
get_op_opfamily_properties(negator, partopfamily, false,
&strategy,
&lefttype, &righttype);
if (strategy != BTEqualStrategyNumber)
continue;
}
this might not be breakable today, but it could well break in the
future, for example, if hash op family managed to grow two more
strategies, then we could get a false match on the matching strategy
numbers (both 3).
14. The following code assumes there will be a right op:
if (IsA(clause, OpExpr))
{
OpExpr *opclause = (OpExpr *) clause;
Expr *leftop,
*rightop,
*valueexpr;
bool is_ne_listp = false;
leftop = (Expr *) get_leftop(clause);
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
rightop = (Expr *) get_rightop(clause);
This'll crash with the following:
create function nonzero(p int) returns bool as $$ begin return (p <>
0); end;$$ language plpgsql;
create operator # (procedure = nonzero, leftarg = int);
create table listp (a int) partition by list (a);
create table listp1 partition of listp for values in(1);
select * from listp where a#;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
You need to ensure that there are 2 args.
15. Various if tests in extract_partition_clauses result in a
`continue` when they should perhaps be a `break` instead.
For example:
if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
continue;
This clause does not depend on the partition key, so there's no point
in trying to match this again for the next partition key.
This item is perhaps not so important, as it's only a small
inefficiency, but I just wanted to point it out. Also, note that plain
Var partition keys cannot be duplicated, but expressions can, so there
may be cases that you don't want to change to `break`
Other conditions which possibly should change to `break` instead of
`continue` include:
/* Only IS [NOT] TRUE/FALSE are any good to us */
if (btest->booltesttype == IS_UNKNOWN ||
btest->booltesttype == IS_NOT_UNKNOWN)
continue;
16. The following code looks a bit fragile:
leftop = IsA(clause, Var)
? (Expr *) clause
: (Expr *) get_notclausearg((Expr *) clause);
This appears to assume that the partition key will be a plain Var and
not an expression. I tried to break this with:
create table bp (a bool, b bool) partition by ((a < b));
create table bp_true partition of bp for values in('t');
explain select * from bp where (a < b);
however, naturally, the parser builds an OpExpr instead of a
BooleanTest for this case. If it had built a BooleanTest, then the
above code would mistakenly call get_notclausearg on the (a < b) Expr.
Do you have reason to believe that the code is safe and a good idea?
17. Which relation is the comment talking about?
/*
* get_partitions_from_args
*
* Returns the set of partitions of relation, each of which satisfies some
* clause in or_args.
*/
static Bitmapset *
get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
18. "sets a field", would it not be better to mention constfalse?:
* returns right after finding such a clause and before returning, sets a field
* in context->clauseinfo to inform the caller that we found such clause.
19. "clauses"
* partitioning, we don't require all of eqkeys to be operator clausses.
20. There does not seem to be a need to palloc0 here. palloc seems fine.
keys->ne_datums = (Datum *)
palloc0(list_length(clauseinfo->ne_clauses) *
sizeof(Datum));
This, of course, may leave unset memory in any unused items, but you
never iterate beyond what n_ne_datums gets set to anyway, so I don't
see the need to zero any extra elements.
21. A code comment should be added to the following code to mention
that these are not arrays indexed by partition key as they're only
ever used for LIST partitioning, which only supports a single key.
/* Datum values from clauses containing <> operator */
Datum *ne_datums;
int n_ne_datums;
22. Can you include: "'keyisnotnull' may also be set for the given
partition key when a strict OpExpr is encountered" in the following
comment?
* Based on a IS NULL or IS NOT NULL clause that was matched to a partition
* key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
On 2018/02/17 18:24, David Rowley wrote:
On 2 February 2018 at 23:03, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
I started wondering if it's not such a good idea to make
PartitionClauseInfo a Node at all? I went back to your earlier message
[1] where you said that it's put into the Append node for run-time pruning
to use, but it doesn't sound nice that we'd be putting into the plan
something that's looks more like scratchpad for the partition.c code. I
think we should try to keep PartitionClauseInfo in partition.h and put
only the list of matched bare clauses into Append.That sounds like a good idea.
A patch which puts this back is attached.
I've changed the run-time prune patch to process the clause lists
during execution instead.
Thank you. I'll incorporate it in the version I'll send next.
Regards,
Amit
Hi David.
Thanks a lot for the review comments. Replying to all of your comments.
On 2018/02/17 18:39, David Rowley wrote:
On 15 February 2018 at 18:57, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Here is an updated version.
Thanks for sending v27. I've had a quick look over it while I was
working on the run-time prune patch. However, I've not quite managed a
complete pass of this version yetA couple of things so far:
1. Following loop;
for (i = 0; i < partnatts; i++)
{
if (bms_is_member(i, keys->keyisnull))
{
/* Only the default partition accepts nulls. */
if (partition_bound_has_default(boundinfo))
return bms_make_singleton(boundinfo->default_index);
else
return NULL;
}
}could become:
if (partition_bound_has_default(boundinfo) &&
!bms_is_empty(keys->keyisnull)
return bms_make_singleton(boundinfo->default_index);
else
return NULL;2. Is the following form of loop necessary?
for (i = 0; i < partnatts; i++)
{
if (bms_is_member(i, keys->keyisnull))
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
}Can't this just be:
i = -1;
while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
{
keys->n_eqkeys++;
keyisnull[i] = true;
}Perhaps you can just Assert(i < partnatts), if you're worried about that.
Similar code exists in get_partitions_for_keys_range
Both 1 and 2 are good suggestions, so done that way.
3. Several comments mention partition_bound_bsearch, but there is now
no such function.4. "us" should be "is"
* not be any unassigned range to speak of, because the range us unbounded
Fixed.
5. The following code is more complex than it needs to be:
/*
* Since partition keys with nulls are mapped to the default range
* partition, we must include the default partition if some keys
* could be null.
*/
if (keys->n_minkeys < partnatts || keys->n_maxkeys < partnatts)
{
for (i = 0; i < partnatts; i++)
{
if (!bms_is_member(i, keys->keyisnotnull))
{
include_def = true;
break;
}
}
}Instead of the for loop, couldn't you just write:
include_def = (bms_num_members(keys->keyisnotnull) < partnatts);
Indeed, it's much simpler. Though, I wrote it as:
+ if (bms_num_members(keys->keyisnotnull) < partnatts)
+ include_def = true;
6. The following comment is not well written:
* get_partitions_excluded_by_ne_datums
*
* Returns a Bitmapset of indexes of partitions that can safely be removed
* due to each such partition's every allowable non-null datum appearing in
* a <> opeartor clause.Maybe it would be better to write:
* get_partitions_excluded_by_ne_datums
*
* Returns a Bitmapset of partition indexes that can safely be removed due to
* the discovery of <> clauses for each datum value allowed in the partition.if not, then "opeartor" needs the spelling fixed.
Sure, your rewrite sounds much better.
7. "The following"
* Followig entry points exist to this module.
Fixed.
Are there any other .c files where we comment on all the extern
functions in this way? I don't recall seeing it before.
Hmm, not like the way this patch does, but some files in the executor do
have such introductory comments using description of functions exported by
the module. See for example, nodeSeqscan.c, execMain.c, etc.
Not saying that this is the best way to introduce the module, but this is
the one I went with for now. If this format is not very informative, I'm
willing to rewrite it some other way.
8. The following may as well just: context.partnatts = partnatts;
context.partnatts = rel->part_scheme->partnatts;
9. Why palloc0? Wouldn't palloc be ok?
context.partkeys = (Expr **) palloc0(sizeof(Expr *) *
context.partnatts);Also, no need for context.partnatts, just partnatts should be fine.
Fixed both. Yes, palloc suffices.
10. I'd rather see bms_next_member() used here:
/* Add selected partitions' RT indexes to result. */
while ((i = bms_first_member(partindexes)) >= 0)
result = bms_add_member(result, rel->part_rels[i]->relid);There's not really much use for bms_first_member these days. It can be
slower due to having to traverse the unset lower significant bits each
loop. bms_next_member starts where the previous loop left off.
Thanks for clarifying. Used bms_next_member().
Will try to review more tomorrow.
On 2018/02/18 11:25, David Rowley wrote:
As I mentioned yesterday, here's the remainder of the review:
11. The following comment is misleading. It says: "We currently handle
two such cases:", then it goes on to say the 2nd case is not handled./*
* Handle cases where the clause's operator does not belong to
* the partitioning operator family. We currently handle two
* such cases: 1. Operators named '<>' are not listed in any
* operator family whatsoever, 2. Ordering operators like '<'
* are not listed in the hash operator families. For 1, check
* if list partitioning is in use and if so, proceed to pass
* the clause to the caller without doing any more processing
* ourselves. 2 cannot be handled at all, so the clause is
* simply skipped.
*/
You're right. We don't really "handle" 2. Rewrote the comment like this:
+ * Normally we only bother with operators that are listed as
+ * being part of the partitioning operator family. But we
+ * make an exception in one case -- operators named '<>' are
+ * not listed in any operator family whatsoever, in which
+ * case, we try to perform partition pruning with it only if
+ * list partitioning is in use.
12. The following code should test for LIST partitioning before doing
anything else:if (!op_in_opfamily(opclause->opno, partopfamily))
{
Oid negator;/*
* To confirm if the operator is really '<>', check if its
* negator is a equality operator. If it's a btree
* equality operator *and* this is a list partitioned
* table, we can use it prune partitions.
*/
negator = get_negator(opclause->opno);
if (OidIsValid(negator) &&
op_in_opfamily(negator, partopfamily))
{
Oid lefttype;
Oid righttype;
int strategy;get_op_opfamily_properties(negator, partopfamily,
false,
&strategy,
&lefttype, &righttype);if (strategy == BTEqualStrategyNumber &&
context->strategy == PARTITION_STRATEGY_LIST)
is_ne_listp = true;
}/* Cannot handle this clause. */
if (!is_ne_listp)
continue;
}The code should probably be in the form of:
if (!op_in_opfamily(opclause->opno, partopfamily))
{
if (context->strategy != PARTITION_STRATEGY_LIST)
continue;...
if (strategy == BTEqualStrategyNumber)
is_ne_listp = true;
}that way we'll save 3 syscache lookups when a <> clause appears in a
RANGE or HASH partitioned table.
Good idea, done.
13. The following code makes assumptions that the partitioning op
family is btree:/*
* In case of NOT IN (..), we get a '<>', which while not
* listed as part of any operator family, we are able to
* handle it if its negator is an equality operator that
* is in turn part of the operator family.
*/
if (!op_in_opfamily(saop_op, partopfamily))
{
Oid negator = get_negator(saop_op);
int strategy;
Oid lefttype,
righttype;if (!OidIsValid(negator))
continue;
get_op_opfamily_properties(negator, partopfamily, false,
&strategy,
&lefttype, &righttype);
if (strategy != BTEqualStrategyNumber)
continue;
}this might not be breakable today, but it could well break in the
future, for example, if hash op family managed to grow two more
strategies, then we could get a false match on the matching strategy
numbers (both 3).
I guess we'd be able to anything useful with a NOT IN (..) only with list
partitioning, so I changed the code and surrounding comments like I did
per your comment above. So, if we process it only if list partitioning is
used, the existing code can safely assume that partitioning operator
family is btree.
14. The following code assumes there will be a right op:
if (IsA(clause, OpExpr))
{
OpExpr *opclause = (OpExpr *) clause;
Expr *leftop,
*rightop,
*valueexpr;
bool is_ne_listp = false;leftop = (Expr *) get_leftop(clause);
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
rightop = (Expr *) get_rightop(clause);This'll crash with the following:
create function nonzero(p int) returns bool as $$ begin return (p <>
0); end;$$ language plpgsql;
create operator # (procedure = nonzero, leftarg = int);
create table listp (a int) partition by list (a);
create table listp1 partition of listp for values in(1);
select * from listp where a#;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.You need to ensure that there are 2 args.
Oops, so we can't prune with unary operator clause. Added the check on
number of args.
15. Various if tests in extract_partition_clauses result in a
`continue` when they should perhaps be a `break` instead.For example:
if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
continue;This clause does not depend on the partition key, so there's no point
in trying to match this again for the next partition key.This item is perhaps not so important, as it's only a small
inefficiency, but I just wanted to point it out. Also, note that plain
Var partition keys cannot be duplicated, but expressions can, so there
may be cases that you don't want to change to `break`Other conditions which possibly should change to `break` instead of
`continue` include:/* Only IS [NOT] TRUE/FALSE are any good to us */
if (btest->booltesttype == IS_UNKNOWN ||
btest->booltesttype == IS_NOT_UNKNOWN)
continue;
You're quite right. Replaced all those continue's with break.
16. The following code looks a bit fragile:
leftop = IsA(clause, Var)
? (Expr *) clause
: (Expr *) get_notclausearg((Expr *) clause);This appears to assume that the partition key will be a plain Var and
not an expression. I tried to break this with:create table bp (a bool, b bool) partition by ((a < b));
create table bp_true partition of bp for values in('t');
explain select * from bp where (a < b);however, naturally, the parser builds an OpExpr instead of a
BooleanTest for this case. If it had built a BooleanTest, then the
above code would mistakenly call get_notclausearg on the (a < b) Expr.Do you have reason to believe that the code is safe and a good idea?
Thanks for the example. I think there were problems here. The way the
current code matched these specially shaped clauses with a Boolean
partition key wouldn't have resulted in correct pruning for your example
and perhaps more cases.
I looked at match_boolean_index_clause() and decided that we needed
something similar here. So, in the updated patch, I've added a
match_boolean_partition_clause(), which will be called at the beginning of
the loop to recognize such specially shaped clauses as applicable to a
Boolean partition key and add them to the set of pruning clauses. With
the new code:
create table bp (a int, b int) partition by list ((a < b));
create table bp_true partition of bp for values in ('true');
create table bp_false partition of bp for values in ('false');
explain select * from bp where (a < b) is true;
QUERY PLAN
----------------------------------------------------------------
Append (cost=0.00..38.25 rows=753 width=8)
-> Seq Scan on bp_true (cost=0.00..38.25 rows=753 width=8)
Filter: ((a < b) IS TRUE)
(3 rows)
explain select * from bp where (a < b) is false;
QUERY PLAN
------------------------------------------------------------------
Append (cost=0.00..38.25 rows=1507 width=8)
-> Seq Scan on bp_false (cost=0.00..38.25 rows=1507 width=8)
Filter: ((a < b) IS FALSE)
(3 rows)
explain select * from bp where (a < b) is not false;
QUERY PLAN
----------------------------------------------------------------
Append (cost=0.00..38.25 rows=753 width=8)
-> Seq Scan on bp_true (cost=0.00..38.25 rows=753 width=8)
Filter: ((a < b) IS NOT FALSE)
(3 rows)
explain select * from bp where (a < b) = true;
QUERY PLAN
----------------------------------------------------------------
Append (cost=0.00..38.25 rows=753 width=8)
-> Seq Scan on bp_true (cost=0.00..38.25 rows=753 width=8)
Filter: (a < b)
(3 rows)
explain select * from bp where (a < b) = false;
QUERY PLAN
-----------------------------------------------------------------
Append (cost=0.00..38.25 rows=753 width=8)
-> Seq Scan on bp_false (cost=0.00..38.25 rows=753 width=8)
Filter: (a >= b)
(3 rows)
explain select * from bp where (a < b);
QUERY PLAN
----------------------------------------------------------------
Append (cost=0.00..38.25 rows=753 width=8)
-> Seq Scan on bp_true (cost=0.00..38.25 rows=753 width=8)
Filter: (a < b)
(3 rows)
explain select * from bp where not (a < b);
QUERY PLAN
-----------------------------------------------------------------
Append (cost=0.00..38.25 rows=753 width=8)
-> Seq Scan on bp_false (cost=0.00..38.25 rows=753 width=8)
Filter: (a >= b)
(3 rows)
17. Which relation is the comment talking about?
/*
* get_partitions_from_args
*
* Returns the set of partitions of relation, each of which satisfies some
* clause in or_args.
*/
static Bitmapset *
get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
Fixed the comment.
18. "sets a field", would it not be better to mention constfalse?:
* returns right after finding such a clause and before returning, sets a field
* in context->clauseinfo to inform the caller that we found such clause.
You're right.
19. "clauses"
* partitioning, we don't require all of eqkeys to be operator clausses.
Fixed.
20. There does not seem to be a need to palloc0 here. palloc seems fine.
keys->ne_datums = (Datum *)
palloc0(list_length(clauseinfo->ne_clauses) *
sizeof(Datum));This, of course, may leave unset memory in any unused items, but you
never iterate beyond what n_ne_datums gets set to anyway, so I don't
see the need to zero any extra elements.
Agreed about using palloc.
21. A code comment should be added to the following code to mention
that these are not arrays indexed by partition key as they're only
ever used for LIST partitioning, which only supports a single key./* Datum values from clauses containing <> operator */
Datum *ne_datums;
int n_ne_datums;
OK, done. Also adjusted the comment above the struct's definition.
22. Can you include: "'keyisnotnull' may also be set for the given
partition key when a strict OpExpr is encountered" in the following
comment?* Based on a IS NULL or IS NOT NULL clause that was matched to a partition
* key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set.
Done.
Attached updated patches. Thanks again!
Regards,
Amit
Attachments:
v28-0001-Modify-bound-comparision-functions-to-accept-mem.patchtext/plain; charset=UTF-8; name=v28-0001-Modify-bound-comparision-functions-to-accept-mem.patchDownload
From c0cca70abfa62e03322d9be75281524c69756b9f Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 6 Jul 2017 14:15:22 +0530
Subject: [PATCH v28 1/5] Modify bound comparision functions to accept members
of PartitionKey
Functions partition_rbound_cmp() and partition_rbound_datum_cmp() are
required to merge partition bounds from joining relations. While doing
so, we do not have access to the PartitionKey of either relations. So,
modify these functions to accept only required members of PartitionKey
so that the functions can be reused for merging bounds.
Ashutosh Bapat.
---
src/backend/catalog/partition.c | 53 ++++++++++++++++++++++++++++-------------
1 file changed, 36 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 4dddfcc014..af6e611147 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -165,10 +165,12 @@ static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
List *datums, bool lower);
static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
int remainder2);
-static int32 partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation, Datum *datums1,
+ PartitionRangeDatumKind *kind1, bool lower1,
+ PartitionRangeBound *b2);
+static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
@@ -1113,8 +1115,9 @@ check_new_partition_bound(char *relname, Relation parent,
* First check if the resulting range would be empty with
* specified lower and upper bounds
*/
- if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
- upper) >= 0)
+ if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, lower->datums,
+ lower->kind, true, upper) >= 0)
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -1174,7 +1177,10 @@ check_new_partition_bound(char *relname, Relation parent,
kind = boundinfo->kind[offset + 1];
is_lower = (boundinfo->indexes[offset + 1] == -1);
- cmpval = partition_rbound_cmp(key, datums, kind,
+ cmpval = partition_rbound_cmp(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ datums, kind,
is_lower, upper);
if (cmpval < 0)
{
@@ -2811,7 +2817,9 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
PartitionKey key = (PartitionKey) arg;
- return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+ return partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, b1->datums, b1->kind,
+ b1->lower, b2);
}
/*
@@ -2820,6 +2828,10 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* Return for two range bounds whether the 1st one (specified in datums1,
* kind1, and lower1) is <, =, or > the bound specified in *b2.
*
+ * partnatts, partsupfunc and partcollation give number of attributes in the
+ * bounds to be compared, comparison function to be used and the collations of
+ * attributes resp.
+ *
* Note that if the values of the two range bounds compare equal, then we take
* into account whether they are upper or lower bounds, and an upper bound is
* considered to be smaller than a lower bound. This is important to the way
@@ -2828,7 +2840,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* two contiguous partitions.
*/
static int32
-partition_rbound_cmp(PartitionKey key,
+partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
Datum *datums1, PartitionRangeDatumKind *kind1,
bool lower1, PartitionRangeBound *b2)
{
@@ -2838,7 +2850,7 @@ partition_rbound_cmp(PartitionKey key,
PartitionRangeDatumKind *kind2 = b2->kind;
bool lower2 = b2->lower;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < partnatts; i++)
{
/*
* First, handle cases where the column is unbounded, which should not
@@ -2859,8 +2871,8 @@ partition_rbound_cmp(PartitionKey key,
*/
break;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
datums1[i],
datums2[i]));
if (cmpval != 0)
@@ -2884,9 +2896,14 @@ partition_rbound_cmp(PartitionKey key,
*
* Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
* is <, =, or > partition key of tuple (tuple_datums)
+ *
+ * n_tuple_datums, partsupfunc and partcollation give number of attributes in
+ * the bounds to be compared, comparison function to be used and the collations
+ * of attributes resp.
+ *
*/
static int32
-partition_rbound_datum_cmp(PartitionKey key,
+partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
{
@@ -2900,8 +2917,8 @@ partition_rbound_datum_cmp(PartitionKey key,
else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
return 1;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
rb_datums[i],
tuple_datums[i]));
if (cmpval != 0)
@@ -2978,7 +2995,8 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key,
+ cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3022,7 +3040,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key,
+ cmpval = partition_rbound_datum_cmp(key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
--
2.11.0
v28-0002-Refactor-partition-bound-search-functions.patchtext/plain; charset=UTF-8; name=v28-0002-Refactor-partition-bound-search-functions.patchDownload
From 2aea8dab0dece41d41f36a4febe9b3f1790f79b9 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 8 Feb 2018 19:08:12 +0900
Subject: [PATCH v28 2/5] Refactor partition bound search functions
Remove the PartitionKey argument from their signature and instead
add provide the necessary information through other arguments.
---
src/backend/catalog/partition.c | 75 +++++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index af6e611147..1799aa2c0e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -174,22 +174,24 @@ static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
-static int partition_list_bsearch(PartitionKey key,
+static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal);
-static int partition_range_bsearch(PartitionKey key,
+static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(PartitionKey key,
+static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
+static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
/*
* RelationBuildPartitionDesc
@@ -1004,7 +1006,7 @@ check_new_partition_bound(char *relname, Relation parent,
* boundinfo->datums that is less than or equal to the
* (spec->modulus, spec->remainder) pair.
*/
- offset = partition_hash_bsearch(key, boundinfo,
+ offset = partition_hash_bsearch(boundinfo,
spec->modulus,
spec->remainder);
if (offset < 0)
@@ -1080,7 +1082,9 @@ check_new_partition_bound(char *relname, Relation parent,
int offset;
bool equal;
- offset = partition_list_bsearch(key, boundinfo,
+ offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
+ boundinfo,
val->constvalue,
&equal);
if (offset >= 0 && equal)
@@ -1155,7 +1159,10 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_range_bsearch(key, boundinfo, lower,
+ offset = partition_range_bsearch(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ boundinfo, lower,
&equal);
if (boundinfo->indexes[offset + 1] < 0)
@@ -2574,7 +2581,9 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int greatest_modulus = get_greatest_modulus(boundinfo);
- uint64 rowHash = compute_hash_value(key, values, isnull);
+ uint64 rowHash = compute_hash_value(key->partnatts,
+ key->partsupfunc,
+ values, isnull);
part_index = boundinfo->indexes[rowHash % greatest_modulus];
}
@@ -2590,7 +2599,8 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
bool equal = false;
- bound_offset = partition_list_bsearch(key,
+ bound_offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
partdesc->boundinfo,
values[0], &equal);
if (bound_offset >= 0 && equal)
@@ -2619,11 +2629,13 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
- bound_offset = partition_range_datum_bsearch(key,
- partdesc->boundinfo,
- key->partnatts,
- values,
- &equal);
+ bound_offset =
+ partition_range_datum_bsearch(key->partsupfunc,
+ key->partcollation,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
/*
* The bound at bound_offset is less than or equal to the
* tuple value, so the bound at offset+1 is the upper
@@ -2937,7 +2949,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* to the input value.
*/
static int
-partition_list_bsearch(PartitionKey key,
+partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
{
@@ -2952,8 +2964,8 @@ partition_list_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+ partcollation[0],
boundinfo->datums[mid][0],
value));
if (cmpval <= 0)
@@ -2980,7 +2992,8 @@ partition_list_bsearch(PartitionKey key,
* to the input range bound
*/
static int
-partition_range_bsearch(PartitionKey key,
+partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal)
{
@@ -2995,8 +3008,7 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_cmp(partnatts, partsupfunc, partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3025,7 +3037,7 @@ partition_range_bsearch(PartitionKey key,
* to the input tuple.
*/
static int
-partition_range_datum_bsearch(PartitionKey key,
+partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
{
@@ -3040,8 +3052,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
@@ -3068,8 +3080,7 @@ partition_range_datum_bsearch(PartitionKey key,
* all of them are greater
*/
static int
-partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
int lo,
@@ -3267,27 +3278,27 @@ get_greatest_modulus(PartitionBoundInfo bound)
* Compute the hash value for given not null partition key values.
*/
static uint64
-compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
+compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull)
{
int i;
- int nkeys = key->partnatts;
uint64 rowHash = 0;
Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
- for (i = 0; i < nkeys; i++)
+ for (i = 0; i < partnatts; i++)
{
if (!isnull[i])
{
Datum hash;
- Assert(OidIsValid(key->partsupfunc[i].fn_oid));
+ Assert(OidIsValid(partsupfunc[i].fn_oid));
/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
- hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
+ hash = FunctionCall2(&partsupfunc[i], values[i], seed);
/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
--
2.11.0
v28-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchtext/plain; charset=UTF-8; name=v28-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchDownload
From f5226fce1e7b2b9f7b37dd0a0eaf0d12d782a56f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v28 3/5] Add parttypid, partcollation, partsupfunc to
PartitionScheme
---
src/backend/optimizer/util/plancat.c | 43 +++++++++++++++++++++++++-----------
src/include/nodes/relation.h | 9 ++++++++
2 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..dcfc1665a8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1887,22 +1887,26 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
continue;
/* Match the partition key types. */
- if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
+ if (memcmp(partkey->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
/*
- * Length and byval information should match when partopcintype
+ * typlen, typbyval, typcoll information should match when typid
* matches.
*/
Assert(memcmp(partkey->parttyplen, part_scheme->parttyplen,
sizeof(int16) * partnatts) == 0);
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ Assert(memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(bool) * partnatts) == 0);
/* Found matching partition scheme. */
return part_scheme;
@@ -1918,16 +1922,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
part_scheme->strategy = partkey->strategy;
part_scheme->partnatts = partkey->partnatts;
- part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopfamily, partkey->partopfamily,
- sizeof(Oid) * partnatts);
-
- part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopcintype, partkey->partopcintype,
- sizeof(Oid) * partnatts);
-
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->parttypid = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypid, partkey->parttypid,
sizeof(Oid) * partnatts);
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
@@ -1938,6 +1934,27 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopfamily, partkey->partopfamily,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopcintype, partkey->partopcintype,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..ce9975c620 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -342,6 +343,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -350,10 +354,15 @@ typedef struct PartitionSchemeData
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
+ Oid *parttypid;
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Array of partition key comparison function pointers */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v28-0004-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v28-0004-Faster-partition-pruning.patchDownload
From c5d4df74c8bc6b4f2f263147f45744f4d773aa43 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v28 4/5] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
Dilip Kumar (dilipbalaut@gmail.com),
David Rowley (david.rowley@2ndquadrant.com)
---
src/backend/catalog/partition.c | 648 +++++++++++
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1506 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 92 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 430 ++++++-
src/test/regress/sql/partition_prune.sql | 77 +-
15 files changed, 2794 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1799aa2c0e..a10eaad570 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,15 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1569,648 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of partitions excluded due to each of
+ * those partitions' *all* of allowed datums appearing in
+ * keys->ne_datums, that is compared to the partition key
+ * using <> operator.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ i = -1;
+ while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
+ {
+ keys->n_eqkeys++;
+ keyisnull[i] = true;
+ }
+ Assert(i < partnatts);
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /*
+ * If the query is looking for null keys, there can only be one such
+ * partition. Return the same if one exists.
+ */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* Exactly matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * The bound at minoff is <= minkeys, given the way
+ * partition_list_bsearch() works. If it's not equal (<), then
+ * increment minoff to make it point to the datum on the right
+ * that necessarily satisfies minkeys. Also do the same if it is
+ * equal but minkeys is exclusive.
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys,
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * minkeys is greater than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * The bound at maxoff is <= maxkeys, given the way
+ * partition_list_bsearch works. If the bound at maxoff exactly
+ * matches maxkey (is_equal), but the maxkey is exclusive, then
+ * decrement maxoff to point to the bound on the left.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal,
+ include_def = false;
+
+ /* Only the default range partition accepts nulls. */
+ if (!bms_is_empty(keys->keyisnull))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_range_datum_bsearch works. Considering it as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_range_datum_bsearch would've returned the offset of just
+ * one of those. If minkey is inclusive, we must decrement minoff
+ * until it reaches the leftmost of those bound values, so that
+ * partitions corresponding to all those bound values are selected.
+ * If minkeys is exclusive, we must increment minoff until it reaches
+ * the first bound greater than this prefix, so that none of the
+ * partitions corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff += 1;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff -= 1;
+ else
+ minoff += 1;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, minoff/maxoff supposedly point to the upper bound of
+ * some partition, but it may not be the case. It might actually be the
+ * upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey = partnatts - 1;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey = partnatts - 1;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE)
+ include_def = true;
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ {
+ include_def = true;
+ break;
+ }
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * *could* be null.
+ */
+ if (bms_num_members(keys->keyisnotnull) < partnatts)
+ include_def = true;
+
+ if (include_def && partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f714247ebb..0120d35cf2 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Bitmapset *live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..ffbcd12604
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1506 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Following entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * effort only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. Caller must have set up a
+ * valid partition pruning context in the form of struct PartitionPruneContext,
+ * that is, each of its fields other other than clauseinfo must be valid before
+ * calling here. After extracting relevant clauses, clauseinfo is filled with
+ * information that will be used for actual pruning.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on relevant partitioning
+ * clauses. Caller must have called generate_partition_clauses() at least
+ * once and hence a valid partition pruning context must have already been
+ * created. Especially, PartitionPruneContext.clauseinfo must contain valid
+ * information. Partition pruning proceeds by extracting constant values
+ * from the clauses and comparing it with the partition bounds while also
+ * taking into account strategies of the operators in the matched clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set. A bit
+ * in 'keyisnotnull' may also be set when a strict OpExpr is encountered for
+ * the given partition key.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static void extract_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+static bool match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns a Bitmapset of the RT indexes of relations belonging to the
+ * minimum set of partitions which must be scanned to satisfy rel's
+ * baserestrictinfo quals.
+ */
+Bitmapset *
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Bitmapset *result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.parttypid = rel->part_scheme->parttypid;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses; context.clauseinfo will be set */
+ generate_partition_clauses(&context, clauses);
+
+ if (!context.clauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes = get_partitions_from_clauses(&context);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Analyzes clauses to find those that match the partition key and sets
+ * context->clauseinfo
+ *
+ * Ideally, this should be called only once for a given query and a given
+ * partitioned table.
+ */
+void
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* And away we go to do the real work; context->clauseinfo will be set */
+ extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its slot in minimalclauses
+ * with the most restrictive of the clauses from the corresponding
+ * list in context->clauseinfo.
+ */
+ remove_redundant_clauses(context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in context->clauseinfo. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ */
+static void
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ context->clauseinfo = partclauseinfo =
+ palloc0(sizeof(PartitionClauseInfo));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (IsBooleanOpfamily(partopfamily))
+ {
+ Expr *rightop;
+
+ if (match_boolean_partition_clause(clause, partkey, &rightop))
+ {
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ continue;
+ }
+ }
+
+ if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ break;
+ }
+ else
+ /* clause does not match this partition key. */
+ continue;
+
+ /*
+ * Clause is useless if it's collation is different from the
+ * partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ break;
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ break;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ break;
+
+ /*
+ * Normally we only bother with operators that are listed as
+ * being part of the partitioning operator family. But we
+ * make an exception in one case -- operators named '<>' are
+ * not listed in any operator family whatsoever, in which
+ * case, we try to perform partition pruning with it only if
+ * list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_ne_listp = true;
+ }
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey))
+ continue;
+
+ /*
+ * Also, useless, if the clause's collation is different from
+ * the partitioning collation.
+ */
+ if (!PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ break;
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ break;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ break;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ break;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ break;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_from_or_args
+ *
+ * Returns the set of indexes of partitions, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ /*
+ * All fields except clauseinfo are same as in the parent context,
+ * which will be set by calling extract_partition_clauses().
+ */
+ memcpy(&subcontext, context, sizeof(PartitionPruneContext));
+ extract_partition_clauses(&subcontext, clauses);
+
+ if (!subcontext.clauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!subcontext.clauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(&subcontext);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets
+ * constfalse in context->clauseinfo to inform the caller that we found such
+ * clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and cur is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and cur is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and cur is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* cur is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(parttypid, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(parttypid, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clauses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Collect datums from <> operator clauses in its dedicated array. */
+ if (clauseinfo->ne_clauses)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ keys->ne_datums = (Datum *)
+ palloc(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->parttypid[0], pc->value,
+ &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != parttypid)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ parttypid, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index dcfc1665a8..f3063be6d9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..ed27ca921e 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,94 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *parttypid;
+ Oid *partopfamily;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+
+ /* Information about matched clauses */
+ struct PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Datum arrays eqkeys, minkeys, and maxkeys are indexed by
+ * partition key number, whereas ne_datums is not. Bitmapsets keyisnull and
+ * keyisnotnull have a bit for each partition key.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses corresponding to the datums stored in
+ * minkeys and maxkeys, respectively, are inclusive of the stored value or
+ * not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Datum values from clauses containing <> operator. Note that, unlike
+ * the arrays above, the following array is not indexed by partition
+ * key. We only ever use this array for list partitioning and there
+ * can only be one partition key in that case anyway.
+ */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +161,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ce9975c620..5ee23a5bb5 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -538,6 +538,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -666,6 +668,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..5c0d469600
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Bitmapset *prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern void generate_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..bc9ff38253 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,355 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..b7c5abf378 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,79 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp;
--
2.11.0
v28-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v28-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 5633f5951083d9cbf8f3673c96b60ed06639fcf4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v28 5/5] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 106 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 82255b0d1d..1bb76dd4f5 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2260,21 +2260,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5039,9 +5024,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9bc8e38d7..cf381573e9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3184,9 +3174,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 011d2a3fa9..fe309a6b54 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4072,9 +4063,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0120d35cf2..d40429ad6d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447c..646d118a5f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -559,7 +559,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -574,6 +573,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1116,12 +1116,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Bitmapset *partitioned_rels_bms = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1193,10 +1193,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_rels_bms = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1380,7 +1382,6 @@ inheritance_planner(PlannerInfo *root)
parent_relids =
bms_add_member(parent_relids, appinfo->child_relid);
parent_roots[appinfo->child_relid] = subroot;
-
continue;
}
@@ -1427,6 +1428,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_rels_bms = bms_add_member(partitioned_rels_bms,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1527,6 +1532,20 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_rels_bms)
+ {
+ int parent_rti;
+
+ while ((parent_rti = bms_first_member(partitioned_rels_bms)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, parent_rti);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1534,7 +1553,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5931,65 +5950,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..c097da6425 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ee23a5bb5..5579940d98 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -252,8 +252,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -318,6 +316,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -542,6 +543,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -674,6 +678,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2126,27 +2131,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
On 19 February 2018 at 22:19, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated patches. Thanks again!
Thanks for making those changes. I've made another pass of v28 and
have a few more comments.
The patch is starting to look good, but there are some new changes in
recent versions which still don't look quite right.
1. This does not fully make sense:
/*
* Remove the indexes of partitions excluded due to each of
* those partitions' *all* of allowed datums appearing in
* keys->ne_datums, that is compared to the partition key
* using <> operator.
*/
"each of those partitions' *all* of allowed" is not correct.
Maybe better to write:
/*
* Remove the indexes of any partitions which cannot possibly
* contain rows matching the clauses due to key->ne_datums containing
* all datum values which are allowed in the given partition. This
* is only possible to do in LIST partitioning as it's the only
* partitioning strategy which allows the specification of exact values.
*/
2. Mine does not, but some compilers may complain about
get_partitions_for_keys() result variable being uninitalised in
get_partitions_for_keys. Probably the easiest fix would be to just
assign to NULL in the default case.
3. Did you mean to put this Assert() inside the loop?
memset(keyisnull, false, sizeof(keyisnull));
i = -1;
while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
Assert(i < partnatts);
i will always be -2 at the end of the loop. Seems like a useless
Assert in its current location.
4. Can you add a comment here to say: "Note: LIST partitioning only
supports a single partition key, therefore this function requires no
looping over the partition keys"
/*
* get_partitions_for_keys_list
* Return partitions of a list partitioned table for requested keys
*
* This interprets the keys and looks up partitions in the partition bound
* descriptor using the list partitioning semantics.
*/
5. The following comment contains a bit of duplication to the comment
which comes after it. Maybe the following:
/*
* If the query is looking for null keys, there can only be one such
* partition. Return the same if one exists.
*/
can be changed to:
/* Handle clauses requesting a NULL valued partition key */
6. The following comment does not quite make sense:
/* Exactly matching datum exists. */
Probably better to write:
/* An exact matching datum exists. */
7. "If it's not equal (<)" I think you mean (>), not (<), in:
* The bound at minoff is <= minkeys, given the way
* partition_list_bsearch() works. If it's not equal (<), then
* increment minoff to make it point to the datum on the right
* that necessarily satisfies minkeys. Also do the same if it is
* equal but minkeys is exclusive.
However, the comment is a bit clumsy. Maybe the following is better?
/*
* partition_list_bsearch returning a positive number means that
* minkeys[0] must be greater than or equal to the smallest datum.
* If we didn't find an exact matching datum (!is_equal) or if the
* operator used was non-inclusive (>), then in both of these
* cases we're not interested in the datum pointed to by minoff,
* but we may start getting matches in the partition which the
* next datum belongs to, so point to that one instead. (This may
* be beyond the last datum in the array, but we'll detect that
* later.)
*/
8. The following comment could be improved:
* minkeys is greater than the datums of all non-default partitions,
* meaning there isn't one to return. Return the default partition if
* one exists.
how about:
* The value of minkeys[0] is greater than all of the datums we have
* partitions for. The only possible partition that could contain a
* match is the default partition. Return that, if it exists.
9. The following could also be improved:
* The bound at maxoff is <= maxkeys, given the way
* partition_list_bsearch works. If the bound at maxoff exactly
* matches maxkey (is_equal), but the maxkey is exclusive, then
* decrement maxoff to point to the bound on the left.
how about:
* partition_list_bsearch returning a positive number means that
* maxkeys[0] must be greater than or equal to the smallest datum.
* If the match found is an equal match, but the operator used is
* non-inclusive of that value (<), then the partition belonging
* to maxoff cannot match, so we'll decrement maxoff to point to
* the partition belonging to the previous datum. We might end up
* decrementing maxoff down to -1, but we'll handle that later.
10. Can you append " This may not technically be true for some data
types (e.g. integer types), however, we currently lack any sort of
infrastructure to provide us with proofs that would allow us to do
anything smarter here." to:
* For range queries, always include the default list partition,
* because list partitions divide the key space in a discontinuous
* manner, not all values in the given range will have a partition
* assigned.
11. get_partitions_for_keys_range seems to prefer to do "minoff -= 1",
but get_partitions_for_keys_list likes to "minoff--", can this be made
the same? Personally, I like -- over -= 1 as it's shorter. Although I
do remember having an argument with my university professor about
this. He claimed -= 1 was clearer... I'm still unsure what he found so
confusing about -- ...
12. The following code could be optimised a little for the case when
there's no default:
/*
* There may exist a range of values unassigned to any non-default
* partition between the datums at minoff and maxoff.
*/
for (i = minoff; i <= maxoff; i++)
{
if (boundinfo->indexes[i] < 0)
{
include_def = true;
break;
}
}
/*
* Since partition keys with nulls are mapped to the default range
* partition, we must include the default partition if some keys
* *could* be null.
*/
if (bms_num_members(keys->keyisnotnull) < partnatts)
include_def = true;
if (include_def && partition_bound_has_default(boundinfo))
result = bms_add_member(result, boundinfo->default_index);
return result;
Maybe something more like:
if (!partition_bound_has_default(boundinfo))
return result;
/*
* There may exist a range of values unassigned to any non-default
* partition between the datums at minoff and maxoff.
*/
for (i = minoff; i <= maxoff; i++)
{
if (boundinfo->indexes[i] < 0)
return bms_add_member(result, boundinfo->default_index);
}
/*
* Since partition keys with nulls are mapped to the default range
* partition, we must include the default partition if some keys
* *could* be null.
*/
if (bms_num_members(keys->keyisnotnull) < partnatts)
return bms_add_member(result, boundinfo->default_index);
Which is saves a bit of needless work when there's no default to add,
and also saves a few lines, including the line where you declare the
include_def variable.
13. Variable name:
Bitmapset *partitioned_rels_bms = NULL;
This should likely be called partitioned_relids, and be of type Relids
instead of Bitmapset.
14. This line removal seems surplus. It should probably be fixed
independently of this patch.
parent_roots[appinfo->child_relid] = subroot;
-
continue;
15. I'm unsure how safe the following code is:
while ((parent_rti = bms_first_member(partitioned_rels_bms)) >= 0)
partitioned_rels = lappend_int(partitioned_rels, parent_rti);
You're now putting this list in ascending order of relid, but some
code in set_plan_refs assumes the root partition is the first element:
root->glob->rootResultRelations =
lappend_int(root->glob->rootResultRelations,
linitial_int(splan->partitioned_rels));
By luck, the first element might today be the root due to the way we
expand the inheritance hierarchy, but I think this code is wrong to
rely on that.
I'm not really a fan of having the root partition be the first element
in the List. I would much rather see a Relids type and a special Index
field for the root, but that might be more changes that you'd like to
make here. I just don't think what you have now is correct.
16. This should probably return Relids rather than Bitmapset *.
Bitmapset *
prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
Please update the mention of Bitmapset in the comment at the top of
the function too.
17. hmm, my patch did palloc(), not palloc0(). My original patch was
broken and missed this, but v2 got it.
context->clauseinfo = partclauseinfo =
palloc0(sizeof(PartitionClauseInfo));
There's no need to palloc0() here. You're setting all the fields to
zero just below. As far as I understand it, it's only Node types that
we have to go through the rigmarole of doing both.
18. I tentatively agree with you having changed the continue to break
in the following:
/* We can't use any volatile value to prune partitions. */
if (contain_volatile_functions((Node *) valueexpr))
break;
I believe it's not wrong to break here, but keep in mind you're
testing valueexpr rather than something with the OpExpr itself. The
reason I'm not saying this is wrong is that if the valueexpr is
volatile then it cannot possibly match another partition key anyway,
so there's likely no point in continuing to look for another match...
You should likely write a comment to explain this a bit. I think all
of the other places you've changed to break look fine. The
ScalarArrayOpExpr volatile function test is fine to break from since
the operands cannot be reversed in that case, so rightop certainly
can't match a partition key.
19. I might have caused this, but there's no such variable as 'cur'
/* cur is more restrictive, so replace the existing. */
20. Is there a difference between
partition_bound_has_default(context->boundinfo) and
context->has_default_part? Any reason for both? Your code uses both. I
don't yet understand why has_default_part was added.
That's all I have for now.
It's getting close. Good work!
I keep having to turn up my strictness level each time I review. I'd
like to be able to turn it up all the way earlier, but I fear I may
still be reviewing v1 if I'd done that :-)
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
Thanks for the review.
On 2018/02/19 22:40, David Rowley wrote:
On 19 February 2018 at 22:19, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Attached updated patches. Thanks again!
Thanks for making those changes. I've made another pass of v28 and
have a few more comments.The patch is starting to look good, but there are some new changes in
recent versions which still don't look quite right.1. This does not fully make sense:
/*
* Remove the indexes of partitions excluded due to each of
* those partitions' *all* of allowed datums appearing in
* keys->ne_datums, that is compared to the partition key
* using <> operator.
*/"each of those partitions' *all* of allowed" is not correct.
Maybe better to write:
/*
* Remove the indexes of any partitions which cannot possibly
* contain rows matching the clauses due to key->ne_datums containing
* all datum values which are allowed in the given partition. This
* is only possible to do in LIST partitioning as it's the only
* partitioning strategy which allows the specification of exact values.
*/
Ah, your rewrite sounds much better.
2. Mine does not, but some compilers may complain about
get_partitions_for_keys() result variable being uninitalised in
get_partitions_for_keys. Probably the easiest fix would be to just
assign to NULL in the default case.
Done.
3. Did you mean to put this Assert() inside the loop?
memset(keyisnull, false, sizeof(keyisnull));
i = -1;
while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
Assert(i < partnatts);i will always be -2 at the end of the loop. Seems like a useless
Assert in its current location.
Face palm! Fixed.
4. Can you add a comment here to say: "Note: LIST partitioning only
supports a single partition key, therefore this function requires no
looping over the partition keys"/*
* get_partitions_for_keys_list
* Return partitions of a list partitioned table for requested keys
*
* This interprets the keys and looks up partitions in the partition bound
* descriptor using the list partitioning semantics.
*/
OK, done.
5. The following comment contains a bit of duplication to the comment
which comes after it. Maybe the following:/*
* If the query is looking for null keys, there can only be one such
* partition. Return the same if one exists.
*/can be changed to:
/* Handle clauses requesting a NULL valued partition key */
You're right. Changed.
6. The following comment does not quite make sense:
/* Exactly matching datum exists. */
Probably better to write:
/* An exact matching datum exists. */
OK, done.
7. "If it's not equal (<)" I think you mean (>), not (<), in:
* The bound at minoff is <= minkeys, given the way
* partition_list_bsearch() works. If it's not equal (<), then
* increment minoff to make it point to the datum on the right
* that necessarily satisfies minkeys. Also do the same if it is
* equal but minkeys is exclusive.However, the comment is a bit clumsy. Maybe the following is better?
/*
* partition_list_bsearch returning a positive number means that
* minkeys[0] must be greater than or equal to the smallest datum.
* If we didn't find an exact matching datum (!is_equal) or if the
* operator used was non-inclusive (>), then in both of these
* cases we're not interested in the datum pointed to by minoff,
* but we may start getting matches in the partition which the
* next datum belongs to, so point to that one instead. (This may
* be beyond the last datum in the array, but we'll detect that
* later.)
*/
Your rewrite is much better.
8. The following comment could be improved:
* minkeys is greater than the datums of all non-default partitions,
* meaning there isn't one to return. Return the default partition if
* one exists.how about:
* The value of minkeys[0] is greater than all of the datums we have
* partitions for. The only possible partition that could contain a
* match is the default partition. Return that, if it exists.
OK, adopted your rewrite.
9. The following could also be improved:
* The bound at maxoff is <= maxkeys, given the way
* partition_list_bsearch works. If the bound at maxoff exactly
* matches maxkey (is_equal), but the maxkey is exclusive, then
* decrement maxoff to point to the bound on the left.how about:
* partition_list_bsearch returning a positive number means that
* maxkeys[0] must be greater than or equal to the smallest datum.
* If the match found is an equal match, but the operator used is
* non-inclusive of that value (<), then the partition belonging
* to maxoff cannot match, so we'll decrement maxoff to point to
* the partition belonging to the previous datum. We might end up
* decrementing maxoff down to -1, but we'll handle that later.
OK, done.
10. Can you append " This may not technically be true for some data
types (e.g. integer types), however, we currently lack any sort of
infrastructure to provide us with proofs that would allow us to do
anything smarter here." to:* For range queries, always include the default list partition,
* because list partitions divide the key space in a discontinuous
* manner, not all values in the given range will have a partition
* assigned.
Hmm, that seems to make sense, so added the text.
I guess you know it already, but I'm trying to say in the comment that
list partitioning, by definition, does not force you to enumerate all
values that a data type may specify to exist.
11. get_partitions_for_keys_range seems to prefer to do "minoff -= 1",
but get_partitions_for_keys_list likes to "minoff--", can this be made
the same? Personally, I like -- over -= 1 as it's shorter. Although I
do remember having an argument with my university professor about
this. He claimed -= 1 was clearer... I'm still unsure what he found so
confusing about -- ...
I will go with minoff-- for consistency as you say.
12. The following code could be optimised a little for the case when
there's no default:/*
* There may exist a range of values unassigned to any non-default
* partition between the datums at minoff and maxoff.
*/
for (i = minoff; i <= maxoff; i++)
{
if (boundinfo->indexes[i] < 0)
{
include_def = true;
break;
}
}/*
* Since partition keys with nulls are mapped to the default range
* partition, we must include the default partition if some keys
* *could* be null.
*/
if (bms_num_members(keys->keyisnotnull) < partnatts)
include_def = true;if (include_def && partition_bound_has_default(boundinfo))
result = bms_add_member(result, boundinfo->default_index);return result;
Maybe something more like:
if (!partition_bound_has_default(boundinfo))
return result;/*
* There may exist a range of values unassigned to any non-default
* partition between the datums at minoff and maxoff.
*/
for (i = minoff; i <= maxoff; i++)
{
if (boundinfo->indexes[i] < 0)
return bms_add_member(result, boundinfo->default_index);
}/*
* Since partition keys with nulls are mapped to the default range
* partition, we must include the default partition if some keys
* *could* be null.
*/
if (bms_num_members(keys->keyisnotnull) < partnatts)
return bms_add_member(result, boundinfo->default_index);Which is saves a bit of needless work when there's no default to add,
and also saves a few lines, including the line where you declare the
include_def variable.
Ah, that's neat. Done that way. I also changed other code that sets
include_def to instead add the default partition index to result right
away, instead of just setting include_def. So, include_def is now gone.
13. Variable name:
Bitmapset *partitioned_rels_bms = NULL;
This should likely be called partitioned_relids, and be of type Relids
instead of Bitmapset.
Makes sense, done.
14. This line removal seems surplus. It should probably be fixed
independently of this patch.parent_roots[appinfo->child_relid] = subroot;
-
continue;
Hadn't noticed that. Fixed.
15. I'm unsure how safe the following code is:
while ((parent_rti = bms_first_member(partitioned_rels_bms)) >= 0)
partitioned_rels = lappend_int(partitioned_rels, parent_rti);You're now putting this list in ascending order of relid, but some
code in set_plan_refs assumes the root partition is the first element:root->glob->rootResultRelations =
lappend_int(root->glob->rootResultRelations,
linitial_int(splan->partitioned_rels));By luck, the first element might today be the root due to the way we
expand the inheritance hierarchy, but I think this code is wrong to
rely on that.I'm not really a fan of having the root partition be the first element
in the List. I would much rather see a Relids type and a special Index
field for the root, but that might be more changes that you'd like to
make here. I just don't think what you have now is correct.
Hmm, so you're saying that it's not future-proof for the code here to
assume that the root table will always get the smallest RT index of the
tables in a given partition tree. I guess it is a concern that exists
independently of the changes this patch makes, but I agree with the
concern. We can write another patch to break that assumption by using the
method you suggest of adding a Index variable to the ModifyTable node to
store the root table index.
16. This should probably return Relids rather than Bitmapset *.
Bitmapset *
prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)Please update the mention of Bitmapset in the comment at the top of
the function too.
OK, done.
17. hmm, my patch did palloc(), not palloc0(). My original patch was
broken and missed this, but v2 got it.context->clauseinfo = partclauseinfo =
palloc0(sizeof(PartitionClauseInfo));There's no need to palloc0() here. You're setting all the fields to
zero just below. As far as I understand it, it's only Node types that
we have to go through the rigmarole of doing both.
Ah, you're right. Used palloc.
18. I tentatively agree with you having changed the continue to break
in the following:/* We can't use any volatile value to prune partitions. */
if (contain_volatile_functions((Node *) valueexpr))
break;I believe it's not wrong to break here, but keep in mind you're
testing valueexpr rather than something with the OpExpr itself. The
reason I'm not saying this is wrong is that if the valueexpr is
volatile then it cannot possibly match another partition key anyway,
so there's likely no point in continuing to look for another match...
You should likely write a comment to explain this a bit I think all> of the other places you've changed to break look fine. The
ScalarArrayOpExpr volatile function test is fine to break from since
the operands cannot be reversed in that case, so rightop certainly
can't match a partition key.
When working on this comment, I realized we shouldn't really "break" on
the collation mismatch. Multiple keys can have the same expression, but
different collation and "break"ing in this case would mean, we'd miss
matching it to another key that has the matching collation. IOW, order in
which clauses appear in the input list determines if they're matched to
partition keys correctly or not.
See this somewhat made up example:
create table rp (a text) partition by range (substr(a, 1) collate "en_GB",
substr(a, 1) collate "en_US");
create table rp1 partition of rp for values from ('a', 'a') to ('a', 'e');
create table rp2 partition of rp for values from ('a', 'e') to ('a', 'z');
create table rp3 partition of rp for values from ('b', 'a') to ('b', 'e');
For the following query:
select * from rp where substr(a, 1) = 'e' collate "en_US" and substr(a, 1)
= 'a' collate "en_GB";
With the current code, we'll end up discarding (via break) the 1st clause
after its collation (en_US) doesn't match the 1st key's collation (en_GB)
and thus we'll end with only one clause, that is the 2nd one, being added
to matched clauses. That would result in only p3 being pruned, whereas
both rp1 and rp3 should be pruned.
I've fixed that. With the new code, both the expression and the collation
should match before we conclude that the clause matched the partition key
and then check other properties of the clause like operator strictness,
valueexpr volatility, etc. If those other properties are not satisfied,
we can "break", because with those properties they won't be useful for
pruning, even if it matched some other key.
19. I might have caused this, but there's no such variable as 'cur'
/* cur is more restrictive, so replace the existing. */
Fixed this and a few other instances.
20. Is there a difference between
partition_bound_has_default(context->boundinfo) and
context->has_default_part? Any reason for both? Your code uses both. I
don't yet understand why has_default_part was added.
One cannot use partition_bound_has_default(context->boundinfo) within
partprune.c, because PartitionBoundInfo's definition is private to
partition.c. Maybe, if someday we expose it by exporting it (and related
partition bound manipulating functions) in, say, a partbound.h, we won't
need to have to resort to the same value being stored in two different
places like this.
That's all I have for now.
It's getting close. Good work!
I keep having to turn up my strictness level each time I review. I'd
like to be able to turn it up all the way earlier, but I fear I may
still be reviewing v1 if I'd done that :-)
It's really great that you've looked into these patches in such great
detail, suggesting various performance improvements (big and small) and
also pointing out various edge cases. Thank you very much! :)
Attached updated version.
Regards,
Amit
Attachments:
v29-0001-Modify-bound-comparision-functions-to-accept-mem.patchtext/plain; charset=UTF-8; name=v29-0001-Modify-bound-comparision-functions-to-accept-mem.patchDownload
From 9fd20721ea59df2b4b3f77986ae079d860ab3d60 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 6 Jul 2017 14:15:22 +0530
Subject: [PATCH v29 1/5] Modify bound comparision functions to accept members
of PartitionKey
Functions partition_rbound_cmp() and partition_rbound_datum_cmp() are
required to merge partition bounds from joining relations. While doing
so, we do not have access to the PartitionKey of either relations. So,
modify these functions to accept only required members of PartitionKey
so that the functions can be reused for merging bounds.
Ashutosh Bapat.
---
src/backend/catalog/partition.c | 53 ++++++++++++++++++++++++++++-------------
1 file changed, 36 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index b1c7cd6c72..edf30bda61 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -165,10 +165,12 @@ static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
List *datums, bool lower);
static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
int remainder2);
-static int32 partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation, Datum *datums1,
+ PartitionRangeDatumKind *kind1, bool lower1,
+ PartitionRangeBound *b2);
+static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
@@ -1113,8 +1115,9 @@ check_new_partition_bound(char *relname, Relation parent,
* First check if the resulting range would be empty with
* specified lower and upper bounds
*/
- if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
- upper) >= 0)
+ if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, lower->datums,
+ lower->kind, true, upper) >= 0)
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -1174,7 +1177,10 @@ check_new_partition_bound(char *relname, Relation parent,
kind = boundinfo->kind[offset + 1];
is_lower = (boundinfo->indexes[offset + 1] == -1);
- cmpval = partition_rbound_cmp(key, datums, kind,
+ cmpval = partition_rbound_cmp(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ datums, kind,
is_lower, upper);
if (cmpval < 0)
{
@@ -2811,7 +2817,9 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
PartitionKey key = (PartitionKey) arg;
- return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+ return partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, b1->datums, b1->kind,
+ b1->lower, b2);
}
/*
@@ -2820,6 +2828,10 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* Return for two range bounds whether the 1st one (specified in datums1,
* kind1, and lower1) is <, =, or > the bound specified in *b2.
*
+ * partnatts, partsupfunc and partcollation give number of attributes in the
+ * bounds to be compared, comparison function to be used and the collations of
+ * attributes resp.
+ *
* Note that if the values of the two range bounds compare equal, then we take
* into account whether they are upper or lower bounds, and an upper bound is
* considered to be smaller than a lower bound. This is important to the way
@@ -2828,7 +2840,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* two contiguous partitions.
*/
static int32
-partition_rbound_cmp(PartitionKey key,
+partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
Datum *datums1, PartitionRangeDatumKind *kind1,
bool lower1, PartitionRangeBound *b2)
{
@@ -2838,7 +2850,7 @@ partition_rbound_cmp(PartitionKey key,
PartitionRangeDatumKind *kind2 = b2->kind;
bool lower2 = b2->lower;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < partnatts; i++)
{
/*
* First, handle cases where the column is unbounded, which should not
@@ -2859,8 +2871,8 @@ partition_rbound_cmp(PartitionKey key,
*/
break;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
datums1[i],
datums2[i]));
if (cmpval != 0)
@@ -2884,9 +2896,14 @@ partition_rbound_cmp(PartitionKey key,
*
* Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
* is <, =, or > partition key of tuple (tuple_datums)
+ *
+ * n_tuple_datums, partsupfunc and partcollation give number of attributes in
+ * the bounds to be compared, comparison function to be used and the collations
+ * of attributes resp.
+ *
*/
static int32
-partition_rbound_datum_cmp(PartitionKey key,
+partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
{
@@ -2900,8 +2917,8 @@ partition_rbound_datum_cmp(PartitionKey key,
else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
return 1;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
rb_datums[i],
tuple_datums[i]));
if (cmpval != 0)
@@ -2978,7 +2995,8 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key,
+ cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3022,7 +3040,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key,
+ cmpval = partition_rbound_datum_cmp(key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
--
2.11.0
v29-0002-Refactor-partition-bound-search-functions.patchtext/plain; charset=UTF-8; name=v29-0002-Refactor-partition-bound-search-functions.patchDownload
From f6318962f15572c7712601e63e8b95b616e3d1a9 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 8 Feb 2018 19:08:12 +0900
Subject: [PATCH v29 2/5] Refactor partition bound search functions
Remove the PartitionKey argument from their signature and instead
add provide the necessary information through other arguments.
---
src/backend/catalog/partition.c | 75 +++++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index edf30bda61..90e24ee8ec 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -174,22 +174,24 @@ static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
-static int partition_list_bsearch(PartitionKey key,
+static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal);
-static int partition_range_bsearch(PartitionKey key,
+static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(PartitionKey key,
+static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
+static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
/*
* RelationBuildPartitionDesc
@@ -1004,7 +1006,7 @@ check_new_partition_bound(char *relname, Relation parent,
* boundinfo->datums that is less than or equal to the
* (spec->modulus, spec->remainder) pair.
*/
- offset = partition_hash_bsearch(key, boundinfo,
+ offset = partition_hash_bsearch(boundinfo,
spec->modulus,
spec->remainder);
if (offset < 0)
@@ -1080,7 +1082,9 @@ check_new_partition_bound(char *relname, Relation parent,
int offset;
bool equal;
- offset = partition_list_bsearch(key, boundinfo,
+ offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
+ boundinfo,
val->constvalue,
&equal);
if (offset >= 0 && equal)
@@ -1155,7 +1159,10 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_range_bsearch(key, boundinfo, lower,
+ offset = partition_range_bsearch(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ boundinfo, lower,
&equal);
if (boundinfo->indexes[offset + 1] < 0)
@@ -2574,7 +2581,9 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int greatest_modulus = get_greatest_modulus(boundinfo);
- uint64 rowHash = compute_hash_value(key, values, isnull);
+ uint64 rowHash = compute_hash_value(key->partnatts,
+ key->partsupfunc,
+ values, isnull);
part_index = boundinfo->indexes[rowHash % greatest_modulus];
}
@@ -2590,7 +2599,8 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
bool equal = false;
- bound_offset = partition_list_bsearch(key,
+ bound_offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
partdesc->boundinfo,
values[0], &equal);
if (bound_offset >= 0 && equal)
@@ -2619,11 +2629,13 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
- bound_offset = partition_range_datum_bsearch(key,
- partdesc->boundinfo,
- key->partnatts,
- values,
- &equal);
+ bound_offset =
+ partition_range_datum_bsearch(key->partsupfunc,
+ key->partcollation,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
/*
* The bound at bound_offset is less than or equal to the
* tuple value, so the bound at offset+1 is the upper
@@ -2937,7 +2949,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* to the input value.
*/
static int
-partition_list_bsearch(PartitionKey key,
+partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
{
@@ -2952,8 +2964,8 @@ partition_list_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+ partcollation[0],
boundinfo->datums[mid][0],
value));
if (cmpval <= 0)
@@ -2980,7 +2992,8 @@ partition_list_bsearch(PartitionKey key,
* to the input range bound
*/
static int
-partition_range_bsearch(PartitionKey key,
+partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal)
{
@@ -2995,8 +3008,7 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_cmp(partnatts, partsupfunc, partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3025,7 +3037,7 @@ partition_range_bsearch(PartitionKey key,
* to the input tuple.
*/
static int
-partition_range_datum_bsearch(PartitionKey key,
+partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
{
@@ -3040,8 +3052,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
@@ -3068,8 +3080,7 @@ partition_range_datum_bsearch(PartitionKey key,
* all of them are greater
*/
static int
-partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
int lo,
@@ -3267,27 +3278,27 @@ get_greatest_modulus(PartitionBoundInfo bound)
* Compute the hash value for given not null partition key values.
*/
static uint64
-compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
+compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull)
{
int i;
- int nkeys = key->partnatts;
uint64 rowHash = 0;
Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
- for (i = 0; i < nkeys; i++)
+ for (i = 0; i < partnatts; i++)
{
if (!isnull[i])
{
Datum hash;
- Assert(OidIsValid(key->partsupfunc[i].fn_oid));
+ Assert(OidIsValid(partsupfunc[i].fn_oid));
/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
- hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
+ hash = FunctionCall2(&partsupfunc[i], values[i], seed);
/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
--
2.11.0
v29-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchtext/plain; charset=UTF-8; name=v29-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchDownload
From 4a39b77208053889ae84471b97b7933480a4d789 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v29 3/5] Add parttypid, partcollation, partsupfunc to
PartitionScheme
---
src/backend/optimizer/util/plancat.c | 43 +++++++++++++++++++++++++-----------
src/include/nodes/relation.h | 9 ++++++++
2 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..dcfc1665a8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1887,22 +1887,26 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
continue;
/* Match the partition key types. */
- if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
+ if (memcmp(partkey->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
/*
- * Length and byval information should match when partopcintype
+ * typlen, typbyval, typcoll information should match when typid
* matches.
*/
Assert(memcmp(partkey->parttyplen, part_scheme->parttyplen,
sizeof(int16) * partnatts) == 0);
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ Assert(memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(bool) * partnatts) == 0);
/* Found matching partition scheme. */
return part_scheme;
@@ -1918,16 +1922,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
part_scheme->strategy = partkey->strategy;
part_scheme->partnatts = partkey->partnatts;
- part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopfamily, partkey->partopfamily,
- sizeof(Oid) * partnatts);
-
- part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopcintype, partkey->partopcintype,
- sizeof(Oid) * partnatts);
-
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->parttypid = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypid, partkey->parttypid,
sizeof(Oid) * partnatts);
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
@@ -1938,6 +1934,27 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopfamily, partkey->partopfamily,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopcintype, partkey->partopcintype,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..ce9975c620 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -342,6 +343,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -350,10 +354,15 @@ typedef struct PartitionSchemeData
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
+ Oid *parttypid;
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Array of partition key comparison function pointers */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v29-0004-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v29-0004-Faster-partition-pruning.patchDownload
From 56310be48cfe78fdf5647c433ca7215b771b6485 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v29 4/5] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
Dilip Kumar (dilipbalaut@gmail.com),
---
src/backend/catalog/partition.c | 664 +++++++++++
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1515 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 92 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 470 +++++++-
src/test/regress/sql/partition_prune.sql | 94 +-
15 files changed, 2876 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 90e24ee8ec..59e3234938 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,15 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1569,664 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of any partitions which cannot possibly
+ * contain rows matching the clauses due to key->ne_datums
+ * containing all datum values which are allowed in the given
+ * partition. This is only possible to do in LIST partitioning
+ * as it's the only partitioning strategy which allows the
+ * specification of exact values.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ result = NULL;
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ i = -1;
+ while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
+ {
+ keys->n_eqkeys++;
+ Assert(i < partnatts);
+ keyisnull[i] = true;
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ *
+ * Note: LIST partitioning only supports a single partition key, therefore
+ * this function requires no looping over the partition keys.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Handle clauses requesting a NULL valued partition key */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * minkeys[0] must be greater than or equal to the smallest datum.
+ * If we didn't find an exact matching datum (!is_equal) or if the
+ * operator used was non-inclusive (>), then in both of these
+ * cases we're not interested in the datum pointed to by minoff,
+ * but we may start getting matches in the partition which the
+ * next datum belongs to, so point to that one instead. (This may
+ * be beyond the last datum in the array, but we'll detect that
+ * later.)
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys[0],
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * The value of minkeys[0] is greater than all of the datums we have
+ * partitions for. The only possible partition that could contain
+ * a match is the default partition. Return that, if it exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * maxkeys[0] must be greater than or equal to the smallest datum.
+ * If the match found is an equal match, but the operator used is
+ * non-inclusive of that value (<), then the partition belonging
+ * to maxoff cannot match, so we'll decrement maxoff to point to
+ * the partition belonging to the previous datum. We might end up
+ * decrementing maxoff down to -1, but we'll handle that later.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous manner,
+ * not all values in the given range will have a partition assigned. This
+ * may not technically be true for some data types (e.g. integer types),
+ * however, we currently lack any sort of infrastructure to provide us
+ * with proofs that would allow us to do anything smarter here.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Only the default range partition accepts nulls. */
+ if (!bms_is_empty(keys->keyisnull))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_range_datum_bsearch works. Considering it as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_range_datum_bsearch would've returned the offset of just
+ * one of those. If minkey is inclusive, we must decrement minoff
+ * until it reaches the leftmost of those bound values, so that
+ * partitions corresponding to all those bound values are selected.
+ * If minkeys is exclusive, we must increment minoff until it reaches
+ * the first bound greater than this prefix, so that none of the
+ * partitions corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff++;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff--;
+ else
+ minoff++;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, minoff/maxoff supposedly point to the upper bound of
+ * some partition, but it may not be the case. It might actually be the
+ * upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+
+ if (!partition_bound_has_default(boundinfo))
+ return result;
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ return bms_add_member(result, boundinfo->default_index);
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (bms_num_members(keys->keyisnotnull) < partnatts)
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f714247ebb..a9eba3a831 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Relids live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..625937fdfa
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1515 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Following entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * efforts only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. Caller must have set up a
+ * valid partition pruning context in the form of struct PartitionPruneContext,
+ * that is, each of its fields other other than clauseinfo must be valid before
+ * calling here. After extracting relevant clauses, clauseinfo is filled with
+ * information that will be used for actual pruning.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on relevant partitioning
+ * clauses. Caller must have called generate_partition_clauses() at least
+ * once and hence a valid partition pruning context must have already been
+ * created. Especially, PartitionPruneContext.clauseinfo must contain valid
+ * information. Partition pruning proceeds by extracting constant values
+ * from the clauses and comparing it with the partition bounds while also
+ * taking into account strategies of the operators in the matched clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set. A bit
+ * in 'keyisnotnull' may also be set when a strict OpExpr is encountered for
+ * the given partition key.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static void extract_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+static bool match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.parttypid = rel->part_scheme->parttypid;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses; context.clauseinfo will be set */
+ generate_partition_clauses(&context, clauses);
+
+ if (!context.clauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes = get_partitions_from_clauses(&context);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Analyzes clauses to find those that match the partition key and sets
+ * context->clauseinfo
+ *
+ * Ideally, this should be called only once for a given query and a given
+ * partitioned table.
+ */
+void
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* And away we go to do the real work; context->clauseinfo will be set */
+ extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its slot in minimalclauses
+ * with the most restrictive of the clauses from the corresponding
+ * list in context->clauseinfo.
+ */
+ remove_redundant_clauses(context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in context->clauseinfo. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ */
+static void
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ context->clauseinfo = partclauseinfo = palloc(sizeof(PartitionClauseInfo));
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (IsBooleanOpfamily(partopfamily))
+ {
+ Expr *rightop;
+
+ if (match_boolean_partition_clause(clause, partkey, &rightop))
+ {
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ continue;
+ }
+ }
+
+ if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ break;
+ }
+ else
+ /* clause does not match this partition key. */
+ continue;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * break instead of continuing to match the clause with the
+ * next key, because the clause is useless no matter which key
+ * it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ break;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ break;
+
+ /*
+ * Normally we only bother with operators that are listed as
+ * being part of the partitioning operator family. But we
+ * make an exception in one case -- operators named '<>' are
+ * not listed in any operator family whatsoever, in which
+ * case, we try to perform partition pruning with it only if
+ * list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_ne_listp = true;
+ }
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ break;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ break;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ break;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ break;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_from_or_args
+ *
+ * Returns the set of indexes of partitions, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ /*
+ * All fields except clauseinfo are same as in the parent context,
+ * which will be set by calling extract_partition_clauses().
+ */
+ memcpy(&subcontext, context, sizeof(PartitionPruneContext));
+ extract_partition_clauses(&subcontext, clauses);
+
+ if (!subcontext.clauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!subcontext.clauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(&subcontext);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets
+ * constfalse in context->clauseinfo to inform the caller that we found such
+ * clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and pc is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and pc is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and pc is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* pc is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(parttypid, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(parttypid, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clauses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Collect datums from <> operator clauses in its dedicated array. */
+ if (clauseinfo->ne_clauses)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ keys->ne_datums = (Datum *)
+ palloc(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->parttypid[0], pc->value,
+ &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != parttypid)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ parttypid, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index dcfc1665a8..f3063be6d9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..ed27ca921e 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,94 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *parttypid;
+ Oid *partopfamily;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+
+ /* Information about matched clauses */
+ struct PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Datum arrays eqkeys, minkeys, and maxkeys are indexed by
+ * partition key number, whereas ne_datums is not. Bitmapsets keyisnull and
+ * keyisnotnull have a bit for each partition key.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses corresponding to the datums stored in
+ * minkeys and maxkeys, respectively, are inclusive of the stored value or
+ * not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Datum values from clauses containing <> operator. Note that, unlike
+ * the arrays above, the following array is not indexed by partition
+ * key. We only ever use this array for list partitioning and there
+ * can only be one partition key in that case anyway.
+ */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +161,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ce9975c620..5ee23a5bb5 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -538,6 +538,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -666,6 +668,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..2b84ed90bf
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern void generate_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..bb5eeacbe7 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,395 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "en_GB", substr(a, 1) collate "en_US");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "en_US";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "en_US")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "en_US")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "en_US")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "en_GB";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "en_GB")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "en_GB")
+(5 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "en_US" and substr(a, 1) = 'a' collate "en_GB";
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "en_US") AND (substr(a, 1) = 'a'::text COLLATE "en_GB"))
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..5809146c38 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,96 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "en_GB", substr(a, 1) collate "en_US");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "en_US";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "en_GB";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "en_US" and substr(a, 1) = 'a' collate "en_GB";
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi;
--
2.11.0
v29-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v29-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 827d21adc0530c2870dac5e65879cc2692a5f18c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v29 5/5] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 107 insertions(+), 220 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 82255b0d1d..1bb76dd4f5 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2260,21 +2260,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5039,9 +5024,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9bc8e38d7..cf381573e9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3184,9 +3174,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 011d2a3fa9..fe309a6b54 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4072,9 +4063,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a9eba3a831..17eae105ec 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447c..8fa90b1f48 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -559,7 +559,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -574,6 +573,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1116,12 +1116,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1193,10 +1193,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1427,6 +1429,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1527,6 +1533,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1534,7 +1555,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5931,65 +5952,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..c097da6425 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ee23a5bb5..5579940d98 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -252,8 +252,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -318,6 +316,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -542,6 +543,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -674,6 +678,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2126,27 +2131,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
On Tue, Feb 20, 2018 at 12:34 PM, Amit Langote <
Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated version.
Hi Amit,
I have applied v29 patch-set on head and got "ERROR: operator 1209 is not
a member of opfamily 1994" with below test case. Please take a look.
CREATE TABLE part (c1 INT4, c2 TEXT, c3 INT4) PARTITION BY LIST (c2);
CREATE TABLE part_p1 PARTITION OF part FOR VALUES IN('ABC');
CREATE TABLE part_p2 PARTITION OF part FOR VALUES IN('DEF');
CREATE TABLE part_p3 PARTITION OF part FOR VALUES IN('GHI');
CREATE TABLE part_p4 PARTITION OF part FOR VALUES IN('JKL');
INSERT INTO part VALUES (100,'ABC',10);
INSERT INTO part VALUES (110,'DEF',20);
INSERT INTO part VALUES (120,'GHI',10);
INSERT INTO part VALUES (130,'JKL',100);
explain (costs off) SELECT * FROM part WHERE c2 LIKE '%ABC%';
*ERROR: operator 1209 is not a member of opfamily 1994*
Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
Thanks Rajkumar for the test.
On 2018/02/20 16:56, Rajkumar Raghuwanshi wrote:
I have applied v29 patch-set on head and got "ERROR: operator 1209 is not
a member of opfamily 1994" with below test case. Please take a look.CREATE TABLE part (c1 INT4, c2 TEXT, c3 INT4) PARTITION BY LIST (c2);
CREATE TABLE part_p1 PARTITION OF part FOR VALUES IN('ABC');
CREATE TABLE part_p2 PARTITION OF part FOR VALUES IN('DEF');
CREATE TABLE part_p3 PARTITION OF part FOR VALUES IN('GHI');
CREATE TABLE part_p4 PARTITION OF part FOR VALUES IN('JKL');INSERT INTO part VALUES (100,'ABC',10);
INSERT INTO part VALUES (110,'DEF',20);
INSERT INTO part VALUES (120,'GHI',10);
INSERT INTO part VALUES (130,'JKL',100);explain (costs off) SELECT * FROM part WHERE c2 LIKE '%ABC%';
*ERROR: operator 1209 is not a member of opfamily 1994*
An oversight in the v28 patch seems to have caused this. Fixed in the
attached.
Thanks,
Amit
Attachments:
v30-0001-Modify-bound-comparision-functions-to-accept-mem.patchtext/plain; charset=UTF-8; name=v30-0001-Modify-bound-comparision-functions-to-accept-mem.patchDownload
From 9fd20721ea59df2b4b3f77986ae079d860ab3d60 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 6 Jul 2017 14:15:22 +0530
Subject: [PATCH v30 1/5] Modify bound comparision functions to accept members
of PartitionKey
Functions partition_rbound_cmp() and partition_rbound_datum_cmp() are
required to merge partition bounds from joining relations. While doing
so, we do not have access to the PartitionKey of either relations. So,
modify these functions to accept only required members of PartitionKey
so that the functions can be reused for merging bounds.
Ashutosh Bapat.
---
src/backend/catalog/partition.c | 53 ++++++++++++++++++++++++++++-------------
1 file changed, 36 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index b1c7cd6c72..edf30bda61 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -165,10 +165,12 @@ static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
List *datums, bool lower);
static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
int remainder2);
-static int32 partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation, Datum *datums1,
+ PartitionRangeDatumKind *kind1, bool lower1,
+ PartitionRangeBound *b2);
+static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
@@ -1113,8 +1115,9 @@ check_new_partition_bound(char *relname, Relation parent,
* First check if the resulting range would be empty with
* specified lower and upper bounds
*/
- if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
- upper) >= 0)
+ if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, lower->datums,
+ lower->kind, true, upper) >= 0)
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -1174,7 +1177,10 @@ check_new_partition_bound(char *relname, Relation parent,
kind = boundinfo->kind[offset + 1];
is_lower = (boundinfo->indexes[offset + 1] == -1);
- cmpval = partition_rbound_cmp(key, datums, kind,
+ cmpval = partition_rbound_cmp(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ datums, kind,
is_lower, upper);
if (cmpval < 0)
{
@@ -2811,7 +2817,9 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
PartitionKey key = (PartitionKey) arg;
- return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+ return partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, b1->datums, b1->kind,
+ b1->lower, b2);
}
/*
@@ -2820,6 +2828,10 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* Return for two range bounds whether the 1st one (specified in datums1,
* kind1, and lower1) is <, =, or > the bound specified in *b2.
*
+ * partnatts, partsupfunc and partcollation give number of attributes in the
+ * bounds to be compared, comparison function to be used and the collations of
+ * attributes resp.
+ *
* Note that if the values of the two range bounds compare equal, then we take
* into account whether they are upper or lower bounds, and an upper bound is
* considered to be smaller than a lower bound. This is important to the way
@@ -2828,7 +2840,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* two contiguous partitions.
*/
static int32
-partition_rbound_cmp(PartitionKey key,
+partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
Datum *datums1, PartitionRangeDatumKind *kind1,
bool lower1, PartitionRangeBound *b2)
{
@@ -2838,7 +2850,7 @@ partition_rbound_cmp(PartitionKey key,
PartitionRangeDatumKind *kind2 = b2->kind;
bool lower2 = b2->lower;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < partnatts; i++)
{
/*
* First, handle cases where the column is unbounded, which should not
@@ -2859,8 +2871,8 @@ partition_rbound_cmp(PartitionKey key,
*/
break;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
datums1[i],
datums2[i]));
if (cmpval != 0)
@@ -2884,9 +2896,14 @@ partition_rbound_cmp(PartitionKey key,
*
* Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
* is <, =, or > partition key of tuple (tuple_datums)
+ *
+ * n_tuple_datums, partsupfunc and partcollation give number of attributes in
+ * the bounds to be compared, comparison function to be used and the collations
+ * of attributes resp.
+ *
*/
static int32
-partition_rbound_datum_cmp(PartitionKey key,
+partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
{
@@ -2900,8 +2917,8 @@ partition_rbound_datum_cmp(PartitionKey key,
else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
return 1;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
rb_datums[i],
tuple_datums[i]));
if (cmpval != 0)
@@ -2978,7 +2995,8 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key,
+ cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3022,7 +3040,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key,
+ cmpval = partition_rbound_datum_cmp(key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
--
2.11.0
v30-0002-Refactor-partition-bound-search-functions.patchtext/plain; charset=UTF-8; name=v30-0002-Refactor-partition-bound-search-functions.patchDownload
From f6318962f15572c7712601e63e8b95b616e3d1a9 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 8 Feb 2018 19:08:12 +0900
Subject: [PATCH v30 2/5] Refactor partition bound search functions
Remove the PartitionKey argument from their signature and instead
add provide the necessary information through other arguments.
---
src/backend/catalog/partition.c | 75 +++++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index edf30bda61..90e24ee8ec 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -174,22 +174,24 @@ static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
-static int partition_list_bsearch(PartitionKey key,
+static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal);
-static int partition_range_bsearch(PartitionKey key,
+static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(PartitionKey key,
+static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
+static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
/*
* RelationBuildPartitionDesc
@@ -1004,7 +1006,7 @@ check_new_partition_bound(char *relname, Relation parent,
* boundinfo->datums that is less than or equal to the
* (spec->modulus, spec->remainder) pair.
*/
- offset = partition_hash_bsearch(key, boundinfo,
+ offset = partition_hash_bsearch(boundinfo,
spec->modulus,
spec->remainder);
if (offset < 0)
@@ -1080,7 +1082,9 @@ check_new_partition_bound(char *relname, Relation parent,
int offset;
bool equal;
- offset = partition_list_bsearch(key, boundinfo,
+ offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
+ boundinfo,
val->constvalue,
&equal);
if (offset >= 0 && equal)
@@ -1155,7 +1159,10 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_range_bsearch(key, boundinfo, lower,
+ offset = partition_range_bsearch(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ boundinfo, lower,
&equal);
if (boundinfo->indexes[offset + 1] < 0)
@@ -2574,7 +2581,9 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int greatest_modulus = get_greatest_modulus(boundinfo);
- uint64 rowHash = compute_hash_value(key, values, isnull);
+ uint64 rowHash = compute_hash_value(key->partnatts,
+ key->partsupfunc,
+ values, isnull);
part_index = boundinfo->indexes[rowHash % greatest_modulus];
}
@@ -2590,7 +2599,8 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
bool equal = false;
- bound_offset = partition_list_bsearch(key,
+ bound_offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
partdesc->boundinfo,
values[0], &equal);
if (bound_offset >= 0 && equal)
@@ -2619,11 +2629,13 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
- bound_offset = partition_range_datum_bsearch(key,
- partdesc->boundinfo,
- key->partnatts,
- values,
- &equal);
+ bound_offset =
+ partition_range_datum_bsearch(key->partsupfunc,
+ key->partcollation,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
/*
* The bound at bound_offset is less than or equal to the
* tuple value, so the bound at offset+1 is the upper
@@ -2937,7 +2949,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* to the input value.
*/
static int
-partition_list_bsearch(PartitionKey key,
+partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
{
@@ -2952,8 +2964,8 @@ partition_list_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+ partcollation[0],
boundinfo->datums[mid][0],
value));
if (cmpval <= 0)
@@ -2980,7 +2992,8 @@ partition_list_bsearch(PartitionKey key,
* to the input range bound
*/
static int
-partition_range_bsearch(PartitionKey key,
+partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal)
{
@@ -2995,8 +3008,7 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_cmp(partnatts, partsupfunc, partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3025,7 +3037,7 @@ partition_range_bsearch(PartitionKey key,
* to the input tuple.
*/
static int
-partition_range_datum_bsearch(PartitionKey key,
+partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
{
@@ -3040,8 +3052,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
@@ -3068,8 +3080,7 @@ partition_range_datum_bsearch(PartitionKey key,
* all of them are greater
*/
static int
-partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
int lo,
@@ -3267,27 +3278,27 @@ get_greatest_modulus(PartitionBoundInfo bound)
* Compute the hash value for given not null partition key values.
*/
static uint64
-compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
+compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull)
{
int i;
- int nkeys = key->partnatts;
uint64 rowHash = 0;
Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
- for (i = 0; i < nkeys; i++)
+ for (i = 0; i < partnatts; i++)
{
if (!isnull[i])
{
Datum hash;
- Assert(OidIsValid(key->partsupfunc[i].fn_oid));
+ Assert(OidIsValid(partsupfunc[i].fn_oid));
/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
- hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
+ hash = FunctionCall2(&partsupfunc[i], values[i], seed);
/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
--
2.11.0
v30-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchtext/plain; charset=UTF-8; name=v30-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchDownload
From 4a39b77208053889ae84471b97b7933480a4d789 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v30 3/5] Add parttypid, partcollation, partsupfunc to
PartitionScheme
---
src/backend/optimizer/util/plancat.c | 43 +++++++++++++++++++++++++-----------
src/include/nodes/relation.h | 9 ++++++++
2 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..dcfc1665a8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1887,22 +1887,26 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
continue;
/* Match the partition key types. */
- if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
+ if (memcmp(partkey->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
/*
- * Length and byval information should match when partopcintype
+ * typlen, typbyval, typcoll information should match when typid
* matches.
*/
Assert(memcmp(partkey->parttyplen, part_scheme->parttyplen,
sizeof(int16) * partnatts) == 0);
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ Assert(memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(bool) * partnatts) == 0);
/* Found matching partition scheme. */
return part_scheme;
@@ -1918,16 +1922,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
part_scheme->strategy = partkey->strategy;
part_scheme->partnatts = partkey->partnatts;
- part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopfamily, partkey->partopfamily,
- sizeof(Oid) * partnatts);
-
- part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopcintype, partkey->partopcintype,
- sizeof(Oid) * partnatts);
-
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->parttypid = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypid, partkey->parttypid,
sizeof(Oid) * partnatts);
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
@@ -1938,6 +1934,27 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopfamily, partkey->partopfamily,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopcintype, partkey->partopcintype,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..ce9975c620 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -342,6 +343,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -350,10 +354,15 @@ typedef struct PartitionSchemeData
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
+ Oid *parttypid;
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Array of partition key comparison function pointers */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v30-0004-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v30-0004-Faster-partition-pruning.patchDownload
From 6e0b758c33ff71dac6a46c788e6e27ba802a6ad3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v30 4/5] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
Dilip Kumar (dilipbalaut@gmail.com),
---
src/backend/catalog/partition.c | 664 +++++++++++
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1519 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 92 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 486 +++++++-
src/test/regress/sql/partition_prune.sql | 102 +-
15 files changed, 2904 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 90e24ee8ec..59e3234938 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,15 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1569,664 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of any partitions which cannot possibly
+ * contain rows matching the clauses due to key->ne_datums
+ * containing all datum values which are allowed in the given
+ * partition. This is only possible to do in LIST partitioning
+ * as it's the only partitioning strategy which allows the
+ * specification of exact values.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ result = NULL;
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ i = -1;
+ while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
+ {
+ keys->n_eqkeys++;
+ Assert(i < partnatts);
+ keyisnull[i] = true;
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ *
+ * Note: LIST partitioning only supports a single partition key, therefore
+ * this function requires no looping over the partition keys.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Handle clauses requesting a NULL valued partition key */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * minkeys[0] must be greater than or equal to the smallest datum.
+ * If we didn't find an exact matching datum (!is_equal) or if the
+ * operator used was non-inclusive (>), then in both of these
+ * cases we're not interested in the datum pointed to by minoff,
+ * but we may start getting matches in the partition which the
+ * next datum belongs to, so point to that one instead. (This may
+ * be beyond the last datum in the array, but we'll detect that
+ * later.)
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys[0],
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * The value of minkeys[0] is greater than all of the datums we have
+ * partitions for. The only possible partition that could contain
+ * a match is the default partition. Return that, if it exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * maxkeys[0] must be greater than or equal to the smallest datum.
+ * If the match found is an equal match, but the operator used is
+ * non-inclusive of that value (<), then the partition belonging
+ * to maxoff cannot match, so we'll decrement maxoff to point to
+ * the partition belonging to the previous datum. We might end up
+ * decrementing maxoff down to -1, but we'll handle that later.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous manner,
+ * not all values in the given range will have a partition assigned. This
+ * may not technically be true for some data types (e.g. integer types),
+ * however, we currently lack any sort of infrastructure to provide us
+ * with proofs that would allow us to do anything smarter here.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Only the default range partition accepts nulls. */
+ if (!bms_is_empty(keys->keyisnull))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_range_datum_bsearch works. Considering it as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_range_datum_bsearch would've returned the offset of just
+ * one of those. If minkey is inclusive, we must decrement minoff
+ * until it reaches the leftmost of those bound values, so that
+ * partitions corresponding to all those bound values are selected.
+ * If minkeys is exclusive, we must increment minoff until it reaches
+ * the first bound greater than this prefix, so that none of the
+ * partitions corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff++;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff--;
+ else
+ minoff++;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, minoff/maxoff supposedly point to the upper bound of
+ * some partition, but it may not be the case. It might actually be the
+ * upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+
+ if (!partition_bound_has_default(boundinfo))
+ return result;
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ return bms_add_member(result, boundinfo->default_index);
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (bms_num_members(keys->keyisnotnull) < partnatts)
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f714247ebb..a9eba3a831 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Relids live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..f94540285f
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1519 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Following entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * efforts only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. Caller must have set up a
+ * valid partition pruning context in the form of struct PartitionPruneContext,
+ * that is, each of its fields other other than clauseinfo must be valid before
+ * calling here. After extracting relevant clauses, clauseinfo is filled with
+ * information that will be used for actual pruning.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on relevant partitioning
+ * clauses. Caller must have called generate_partition_clauses() at least
+ * once and hence a valid partition pruning context must have already been
+ * created. Especially, PartitionPruneContext.clauseinfo must contain valid
+ * information. Partition pruning proceeds by extracting constant values
+ * from the clauses and comparing it with the partition bounds while also
+ * taking into account strategies of the operators in the matched clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set. A bit
+ * in 'keyisnotnull' may also be set when a strict OpExpr is encountered for
+ * the given partition key.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static void extract_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+static bool match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.parttypid = rel->part_scheme->parttypid;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses; context.clauseinfo will be set */
+ generate_partition_clauses(&context, clauses);
+
+ if (!context.clauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes = get_partitions_from_clauses(&context);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Analyzes clauses to find those that match the partition key and sets
+ * context->clauseinfo
+ *
+ * Ideally, this should be called only once for a given query and a given
+ * partitioned table.
+ */
+void
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* And away we go to do the real work; context->clauseinfo will be set */
+ extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its slot in minimalclauses
+ * with the most restrictive of the clauses from the corresponding
+ * list in context->clauseinfo.
+ */
+ remove_redundant_clauses(context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in context->clauseinfo. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ */
+static void
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ context->clauseinfo = partclauseinfo = palloc(sizeof(PartitionClauseInfo));
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (IsBooleanOpfamily(partopfamily))
+ {
+ Expr *rightop;
+
+ if (match_boolean_partition_clause(clause, partkey, &rightop))
+ {
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ continue;
+ }
+ }
+
+ if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ break;
+ }
+ else
+ /* clause does not match this partition key. */
+ continue;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * break instead of continuing to match the clause with the
+ * next key, because the clause is useless no matter which key
+ * it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ break;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ break;
+
+ /*
+ * Normally we only bother with operators that are listed as
+ * being part of the partitioning operator family. But we
+ * make an exception in one case -- operators named '<>' are
+ * not listed in any operator family whatsoever, in which
+ * case, we try to perform partition pruning with it only if
+ * list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_ne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_ne_listp)
+ break;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ break;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ break;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ break;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ break;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_from_or_args
+ *
+ * Returns the set of indexes of partitions, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ /*
+ * All fields except clauseinfo are same as in the parent context,
+ * which will be set by calling extract_partition_clauses().
+ */
+ memcpy(&subcontext, context, sizeof(PartitionPruneContext));
+ extract_partition_clauses(&subcontext, clauses);
+
+ if (!subcontext.clauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!subcontext.clauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(&subcontext);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets
+ * constfalse in context->clauseinfo to inform the caller that we found such
+ * clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and pc is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and pc is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and pc is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* pc is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(parttypid, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(parttypid, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clauses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Collect datums from <> operator clauses in its dedicated array. */
+ if (clauseinfo->ne_clauses)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ keys->ne_datums = (Datum *)
+ palloc(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->parttypid[0], pc->value,
+ &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != parttypid)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ parttypid, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index dcfc1665a8..f3063be6d9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..ed27ca921e 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,94 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *parttypid;
+ Oid *partopfamily;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+
+ /* Information about matched clauses */
+ struct PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Datum arrays eqkeys, minkeys, and maxkeys are indexed by
+ * partition key number, whereas ne_datums is not. Bitmapsets keyisnull and
+ * keyisnotnull have a bit for each partition key.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses corresponding to the datums stored in
+ * minkeys and maxkeys, respectively, are inclusive of the stored value or
+ * not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Datum values from clauses containing <> operator. Note that, unlike
+ * the arrays above, the following array is not indexed by partition
+ * key. We only ever use this array for list partitioning and there
+ * can only be one partition key in that case anyway.
+ */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +161,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ce9975c620..5ee23a5bb5 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -538,6 +538,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -666,6 +668,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..2b84ed90bf
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern void generate_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..da432af693 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,411 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "en_GB", substr(a, 1) collate "en_US");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "en_US";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "en_US")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "en_US")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "en_US")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "en_GB";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "en_GB")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "en_GB")
+(5 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "en_US" and substr(a, 1) = 'a' collate "en_GB";
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "en_US") AND (substr(a, 1) = 'a'::text COLLATE "en_GB"))
+(3 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..5cccd95767 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,104 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "en_GB", substr(a, 1) collate "en_US");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "en_US";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "en_GB";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "en_US" and substr(a, 1) = 'a' collate "en_GB";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
--
2.11.0
v30-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v30-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 75211764b38c69c59e8f0569f41dba00d99625da Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v30 5/5] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 107 insertions(+), 220 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 82255b0d1d..1bb76dd4f5 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2260,21 +2260,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5039,9 +5024,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9bc8e38d7..cf381573e9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3184,9 +3174,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 011d2a3fa9..fe309a6b54 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4072,9 +4063,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a9eba3a831..17eae105ec 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447c..8fa90b1f48 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -559,7 +559,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -574,6 +573,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1116,12 +1116,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1193,10 +1193,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1427,6 +1429,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1527,6 +1533,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1534,7 +1555,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5931,65 +5952,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..c097da6425 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ee23a5bb5..5579940d98 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -252,8 +252,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -318,6 +316,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -542,6 +543,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -674,6 +678,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2126,27 +2131,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
v30-0004-Faster-partition-pruning.patch contains:
+create table coll_pruning_multi (a text) partition by range
(substr(a, 1) collate "en_GB", substr(a, 1) collate "en_US");
This'll likely work okay on Linux. Other collate tests seem to use
COLLATE "POSIX or "C" so that work cross-platform.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/02/21 10:19, David Rowley wrote:
v30-0004-Faster-partition-pruning.patch contains:
+create table coll_pruning_multi (a text) partition by range
(substr(a, 1) collate "en_GB", substr(a, 1) collate "en_US");This'll likely work okay on Linux. Other collate tests seem to use
COLLATE "POSIX or "C" so that work cross-platform.
Thanks. I completely forgot about that. I've rewritten those tests to
use "POSIX" and "C" in the attached.
Thanks,
Amit
Attachments:
v31-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchtext/plain; charset=UTF-8; name=v31-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchDownload
From 3be19ce59123302b65a3fb13a84a92e52f3a0235 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v31 3/5] Add parttypid, partcollation, partsupfunc to
PartitionScheme
---
src/backend/optimizer/util/plancat.c | 43 +++++++++++++++++++++++++-----------
src/include/nodes/relation.h | 9 ++++++++
2 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..dcfc1665a8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1887,22 +1887,26 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
continue;
/* Match the partition key types. */
- if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
+ if (memcmp(partkey->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
/*
- * Length and byval information should match when partopcintype
+ * typlen, typbyval, typcoll information should match when typid
* matches.
*/
Assert(memcmp(partkey->parttyplen, part_scheme->parttyplen,
sizeof(int16) * partnatts) == 0);
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ Assert(memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(bool) * partnatts) == 0);
/* Found matching partition scheme. */
return part_scheme;
@@ -1918,16 +1922,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
part_scheme->strategy = partkey->strategy;
part_scheme->partnatts = partkey->partnatts;
- part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopfamily, partkey->partopfamily,
- sizeof(Oid) * partnatts);
-
- part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopcintype, partkey->partopcintype,
- sizeof(Oid) * partnatts);
-
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->parttypid = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypid, partkey->parttypid,
sizeof(Oid) * partnatts);
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
@@ -1938,6 +1934,27 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopfamily, partkey->partopfamily,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopcintype, partkey->partopcintype,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..ce9975c620 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -342,6 +343,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -350,10 +354,15 @@ typedef struct PartitionSchemeData
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
+ Oid *parttypid;
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Array of partition key comparison function pointers */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v31-0004-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v31-0004-Faster-partition-pruning.patchDownload
From 7e8944bc4df68d2d8e6870fd73d31cfde141b862 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v31 4/5] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
Dilip Kumar (dilipbalaut@gmail.com),
---
src/backend/catalog/partition.c | 664 +++++++++++
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1519 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 92 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 486 +++++++-
src/test/regress/sql/partition_prune.sql | 102 +-
15 files changed, 2904 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 90e24ee8ec..59e3234938 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,15 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1569,664 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of any partitions which cannot possibly
+ * contain rows matching the clauses due to key->ne_datums
+ * containing all datum values which are allowed in the given
+ * partition. This is only possible to do in LIST partitioning
+ * as it's the only partitioning strategy which allows the
+ * specification of exact values.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ result = NULL;
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ i = -1;
+ while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
+ {
+ keys->n_eqkeys++;
+ Assert(i < partnatts);
+ keyisnull[i] = true;
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ *
+ * Note: LIST partitioning only supports a single partition key, therefore
+ * this function requires no looping over the partition keys.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Handle clauses requesting a NULL valued partition key */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * minkeys[0] must be greater than or equal to the smallest datum.
+ * If we didn't find an exact matching datum (!is_equal) or if the
+ * operator used was non-inclusive (>), then in both of these
+ * cases we're not interested in the datum pointed to by minoff,
+ * but we may start getting matches in the partition which the
+ * next datum belongs to, so point to that one instead. (This may
+ * be beyond the last datum in the array, but we'll detect that
+ * later.)
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys[0],
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * The value of minkeys[0] is greater than all of the datums we have
+ * partitions for. The only possible partition that could contain
+ * a match is the default partition. Return that, if it exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * maxkeys[0] must be greater than or equal to the smallest datum.
+ * If the match found is an equal match, but the operator used is
+ * non-inclusive of that value (<), then the partition belonging
+ * to maxoff cannot match, so we'll decrement maxoff to point to
+ * the partition belonging to the previous datum. We might end up
+ * decrementing maxoff down to -1, but we'll handle that later.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous manner,
+ * not all values in the given range will have a partition assigned. This
+ * may not technically be true for some data types (e.g. integer types),
+ * however, we currently lack any sort of infrastructure to provide us
+ * with proofs that would allow us to do anything smarter here.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Only the default range partition accepts nulls. */
+ if (!bms_is_empty(keys->keyisnull))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_range_datum_bsearch works. Considering it as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_range_datum_bsearch would've returned the offset of just
+ * one of those. If minkey is inclusive, we must decrement minoff
+ * until it reaches the leftmost of those bound values, so that
+ * partitions corresponding to all those bound values are selected.
+ * If minkeys is exclusive, we must increment minoff until it reaches
+ * the first bound greater than this prefix, so that none of the
+ * partitions corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff++;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff--;
+ else
+ minoff++;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff += 1;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff -= 1;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff += 1;
+ else
+ maxoff -= 1;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff += 1;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff += 1;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, minoff/maxoff supposedly point to the upper bound of
+ * some partition, but it may not be the case. It might actually be the
+ * upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ minoff += 1;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+ maxoff -= 1;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+
+ if (!partition_bound_has_default(boundinfo))
+ return result;
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ return bms_add_member(result, boundinfo->default_index);
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (bms_num_members(keys->keyisnotnull) < partnatts)
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f714247ebb..a9eba3a831 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Relids live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..f94540285f
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1519 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Following entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * efforts only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. Caller must have set up a
+ * valid partition pruning context in the form of struct PartitionPruneContext,
+ * that is, each of its fields other other than clauseinfo must be valid before
+ * calling here. After extracting relevant clauses, clauseinfo is filled with
+ * information that will be used for actual pruning.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on relevant partitioning
+ * clauses. Caller must have called generate_partition_clauses() at least
+ * once and hence a valid partition pruning context must have already been
+ * created. Especially, PartitionPruneContext.clauseinfo must contain valid
+ * information. Partition pruning proceeds by extracting constant values
+ * from the clauses and comparing it with the partition bounds while also
+ * taking into account strategies of the operators in the matched clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on a IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set. A bit
+ * in 'keyisnotnull' may also be set when a strict OpExpr is encountered for
+ * the given partition key.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static void extract_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+static bool match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.parttypid = rel->part_scheme->parttypid;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses; context.clauseinfo will be set */
+ generate_partition_clauses(&context, clauses);
+
+ if (!context.clauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes = get_partitions_from_clauses(&context);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Analyzes clauses to find those that match the partition key and sets
+ * context->clauseinfo
+ *
+ * Ideally, this should be called only once for a given query and a given
+ * partitioned table.
+ */
+void
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* And away we go to do the real work; context->clauseinfo will be set */
+ extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its slot in minimalclauses
+ * with the most restrictive of the clauses from the corresponding
+ * list in context->clauseinfo.
+ */
+ remove_redundant_clauses(context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in context->clauseinfo. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ */
+static void
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ context->clauseinfo = partclauseinfo = palloc(sizeof(PartitionClauseInfo));
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (IsBooleanOpfamily(partopfamily))
+ {
+ Expr *rightop;
+
+ if (match_boolean_partition_clause(clause, partkey, &rightop))
+ {
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ continue;
+ }
+ }
+
+ if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ break;
+ }
+ else
+ /* clause does not match this partition key. */
+ continue;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * break instead of continuing to match the clause with the
+ * next key, because the clause is useless no matter which key
+ * it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ break;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ break;
+
+ /*
+ * Normally we only bother with operators that are listed as
+ * being part of the partitioning operator family. But we
+ * make an exception in one case -- operators named '<>' are
+ * not listed in any operator family whatsoever, in which
+ * case, we try to perform partition pruning with it only if
+ * list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_ne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_ne_listp)
+ break;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee null are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ break;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ break;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ break;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ break;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_from_or_args
+ *
+ * Returns the set of indexes of partitions, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ /*
+ * All fields except clauseinfo are same as in the parent context,
+ * which will be set by calling extract_partition_clauses().
+ */
+ memcpy(&subcontext, context, sizeof(PartitionPruneContext));
+ extract_partition_clauses(&subcontext, clauses);
+
+ if (!subcontext.clauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!subcontext.clauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(&subcontext);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets
+ * constfalse in context->clauseinfo to inform the caller that we found such
+ * clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and pc is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and pc is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and pc is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* pc is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(parttypid, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(parttypid, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clauses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Collect datums from <> operator clauses in its dedicated array. */
+ if (clauseinfo->ne_clauses)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ keys->ne_datums = (Datum *)
+ palloc(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->parttypid[0], pc->value,
+ &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != parttypid)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ parttypid, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index dcfc1665a8..f3063be6d9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..ed27ca921e 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,94 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *parttypid;
+ Oid *partopfamily;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+
+ /* Information about matched clauses */
+ struct PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Datum arrays eqkeys, minkeys, and maxkeys are indexed by
+ * partition key number, whereas ne_datums is not. Bitmapsets keyisnull and
+ * keyisnotnull have a bit for each partition key.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses corresponding to the datums stored in
+ * minkeys and maxkeys, respectively, are inclusive of the stored value or
+ * not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Datum values from clauses containing <> operator. Note that, unlike
+ * the arrays above, the following array is not indexed by partition
+ * key. We only ever use this array for list partitioning and there
+ * can only be one partition key in that case anyway.
+ */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +161,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ce9975c620..5ee23a5bb5 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -538,6 +538,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -666,6 +668,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..2b84ed90bf
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern void generate_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..948cad4c3d 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,411 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(5 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(3 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..08fc2dbc21 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,104 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
--
2.11.0
v31-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v31-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From e1a03e302a16247dd81df14681a270834291d837 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v31 5/5] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 107 insertions(+), 220 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 82255b0d1d..1bb76dd4f5 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2260,21 +2260,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5039,9 +5024,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9bc8e38d7..cf381573e9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3184,9 +3174,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 011d2a3fa9..fe309a6b54 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4072,9 +4063,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a9eba3a831..17eae105ec 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447c..8fa90b1f48 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -559,7 +559,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -574,6 +573,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1116,12 +1116,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1193,10 +1193,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1427,6 +1429,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1527,6 +1533,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1534,7 +1555,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5931,65 +5952,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..c097da6425 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ee23a5bb5..5579940d98 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -252,8 +252,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -318,6 +316,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -542,6 +543,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -674,6 +678,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2126,27 +2131,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
v31-0001-Modify-bound-comparision-functions-to-accept-mem.patchtext/plain; charset=UTF-8; name=v31-0001-Modify-bound-comparision-functions-to-accept-mem.patchDownload
From 487712fb7f49da269b3097d1416d7061f5869289 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 6 Jul 2017 14:15:22 +0530
Subject: [PATCH v31 1/5] Modify bound comparision functions to accept members
of PartitionKey
Functions partition_rbound_cmp() and partition_rbound_datum_cmp() are
required to merge partition bounds from joining relations. While doing
so, we do not have access to the PartitionKey of either relations. So,
modify these functions to accept only required members of PartitionKey
so that the functions can be reused for merging bounds.
Ashutosh Bapat.
---
src/backend/catalog/partition.c | 53 ++++++++++++++++++++++++++++-------------
1 file changed, 36 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index b1c7cd6c72..edf30bda61 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -165,10 +165,12 @@ static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
List *datums, bool lower);
static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
int remainder2);
-static int32 partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation, Datum *datums1,
+ PartitionRangeDatumKind *kind1, bool lower1,
+ PartitionRangeBound *b2);
+static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
@@ -1113,8 +1115,9 @@ check_new_partition_bound(char *relname, Relation parent,
* First check if the resulting range would be empty with
* specified lower and upper bounds
*/
- if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
- upper) >= 0)
+ if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, lower->datums,
+ lower->kind, true, upper) >= 0)
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -1174,7 +1177,10 @@ check_new_partition_bound(char *relname, Relation parent,
kind = boundinfo->kind[offset + 1];
is_lower = (boundinfo->indexes[offset + 1] == -1);
- cmpval = partition_rbound_cmp(key, datums, kind,
+ cmpval = partition_rbound_cmp(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ datums, kind,
is_lower, upper);
if (cmpval < 0)
{
@@ -2811,7 +2817,9 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
PartitionKey key = (PartitionKey) arg;
- return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+ return partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, b1->datums, b1->kind,
+ b1->lower, b2);
}
/*
@@ -2820,6 +2828,10 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* Return for two range bounds whether the 1st one (specified in datums1,
* kind1, and lower1) is <, =, or > the bound specified in *b2.
*
+ * partnatts, partsupfunc and partcollation give number of attributes in the
+ * bounds to be compared, comparison function to be used and the collations of
+ * attributes resp.
+ *
* Note that if the values of the two range bounds compare equal, then we take
* into account whether they are upper or lower bounds, and an upper bound is
* considered to be smaller than a lower bound. This is important to the way
@@ -2828,7 +2840,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* two contiguous partitions.
*/
static int32
-partition_rbound_cmp(PartitionKey key,
+partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
Datum *datums1, PartitionRangeDatumKind *kind1,
bool lower1, PartitionRangeBound *b2)
{
@@ -2838,7 +2850,7 @@ partition_rbound_cmp(PartitionKey key,
PartitionRangeDatumKind *kind2 = b2->kind;
bool lower2 = b2->lower;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < partnatts; i++)
{
/*
* First, handle cases where the column is unbounded, which should not
@@ -2859,8 +2871,8 @@ partition_rbound_cmp(PartitionKey key,
*/
break;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
datums1[i],
datums2[i]));
if (cmpval != 0)
@@ -2884,9 +2896,14 @@ partition_rbound_cmp(PartitionKey key,
*
* Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
* is <, =, or > partition key of tuple (tuple_datums)
+ *
+ * n_tuple_datums, partsupfunc and partcollation give number of attributes in
+ * the bounds to be compared, comparison function to be used and the collations
+ * of attributes resp.
+ *
*/
static int32
-partition_rbound_datum_cmp(PartitionKey key,
+partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
{
@@ -2900,8 +2917,8 @@ partition_rbound_datum_cmp(PartitionKey key,
else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
return 1;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
rb_datums[i],
tuple_datums[i]));
if (cmpval != 0)
@@ -2978,7 +2995,8 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key,
+ cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3022,7 +3040,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key,
+ cmpval = partition_rbound_datum_cmp(key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
--
2.11.0
v31-0002-Refactor-partition-bound-search-functions.patchtext/plain; charset=UTF-8; name=v31-0002-Refactor-partition-bound-search-functions.patchDownload
From fc961dd4e9bd20d04cb9e4de506dae4ef01453d6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 8 Feb 2018 19:08:12 +0900
Subject: [PATCH v31 2/5] Refactor partition bound search functions
Remove the PartitionKey argument from their signature and instead
add provide the necessary information through other arguments.
---
src/backend/catalog/partition.c | 75 +++++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index edf30bda61..90e24ee8ec 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -174,22 +174,24 @@ static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
-static int partition_list_bsearch(PartitionKey key,
+static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal);
-static int partition_range_bsearch(PartitionKey key,
+static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(PartitionKey key,
+static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
+static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
/*
* RelationBuildPartitionDesc
@@ -1004,7 +1006,7 @@ check_new_partition_bound(char *relname, Relation parent,
* boundinfo->datums that is less than or equal to the
* (spec->modulus, spec->remainder) pair.
*/
- offset = partition_hash_bsearch(key, boundinfo,
+ offset = partition_hash_bsearch(boundinfo,
spec->modulus,
spec->remainder);
if (offset < 0)
@@ -1080,7 +1082,9 @@ check_new_partition_bound(char *relname, Relation parent,
int offset;
bool equal;
- offset = partition_list_bsearch(key, boundinfo,
+ offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
+ boundinfo,
val->constvalue,
&equal);
if (offset >= 0 && equal)
@@ -1155,7 +1159,10 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_range_bsearch(key, boundinfo, lower,
+ offset = partition_range_bsearch(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ boundinfo, lower,
&equal);
if (boundinfo->indexes[offset + 1] < 0)
@@ -2574,7 +2581,9 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int greatest_modulus = get_greatest_modulus(boundinfo);
- uint64 rowHash = compute_hash_value(key, values, isnull);
+ uint64 rowHash = compute_hash_value(key->partnatts,
+ key->partsupfunc,
+ values, isnull);
part_index = boundinfo->indexes[rowHash % greatest_modulus];
}
@@ -2590,7 +2599,8 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
bool equal = false;
- bound_offset = partition_list_bsearch(key,
+ bound_offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
partdesc->boundinfo,
values[0], &equal);
if (bound_offset >= 0 && equal)
@@ -2619,11 +2629,13 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
- bound_offset = partition_range_datum_bsearch(key,
- partdesc->boundinfo,
- key->partnatts,
- values,
- &equal);
+ bound_offset =
+ partition_range_datum_bsearch(key->partsupfunc,
+ key->partcollation,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
/*
* The bound at bound_offset is less than or equal to the
* tuple value, so the bound at offset+1 is the upper
@@ -2937,7 +2949,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* to the input value.
*/
static int
-partition_list_bsearch(PartitionKey key,
+partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
{
@@ -2952,8 +2964,8 @@ partition_list_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+ partcollation[0],
boundinfo->datums[mid][0],
value));
if (cmpval <= 0)
@@ -2980,7 +2992,8 @@ partition_list_bsearch(PartitionKey key,
* to the input range bound
*/
static int
-partition_range_bsearch(PartitionKey key,
+partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal)
{
@@ -2995,8 +3008,7 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_cmp(partnatts, partsupfunc, partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3025,7 +3037,7 @@ partition_range_bsearch(PartitionKey key,
* to the input tuple.
*/
static int
-partition_range_datum_bsearch(PartitionKey key,
+partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
{
@@ -3040,8 +3052,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
@@ -3068,8 +3080,7 @@ partition_range_datum_bsearch(PartitionKey key,
* all of them are greater
*/
static int
-partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
int lo,
@@ -3267,27 +3278,27 @@ get_greatest_modulus(PartitionBoundInfo bound)
* Compute the hash value for given not null partition key values.
*/
static uint64
-compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
+compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull)
{
int i;
- int nkeys = key->partnatts;
uint64 rowHash = 0;
Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
- for (i = 0; i < nkeys; i++)
+ for (i = 0; i < partnatts; i++)
{
if (!isnull[i])
{
Datum hash;
- Assert(OidIsValid(key->partsupfunc[i].fn_oid));
+ Assert(OidIsValid(partsupfunc[i].fn_oid));
/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
- hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
+ hash = FunctionCall2(&partsupfunc[i], values[i], seed);
/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
--
2.11.0
On 21 February 2018 at 14:53, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/02/21 10:19, David Rowley wrote:
v30-0004-Faster-partition-pruning.patch contains:
+create table coll_pruning_multi (a text) partition by range
(substr(a, 1) collate "en_GB", substr(a, 1) collate "en_US");This'll likely work okay on Linux. Other collate tests seem to use
COLLATE "POSIX or "C" so that work cross-platform.Thanks. I completely forgot about that. I've rewritten those tests to
use "POSIX" and "C" in the attached.
Thanks for fixing. I made a pass over v31 and only see a few small things:
1. In get_partitions_for_keys() why is the
get_partitions_excluded_by_ne_datums call not part of
get_partitions_for_keys_list?
2. Still a stray "minoff += 1;" in get_partitions_for_keys_range
3. You're also preferring to minoff--/++, but maxoff -= 1/maxoff += 1;
would be nice to see the style unified here.
4. "other other"
* that is, each of its fields other other than clauseinfo must be valid before
5. "a IS NULL" -> "an IS NULL":
* Based on a IS NULL or IS NOT NULL clause that was matched to a partition
6. Can you add a warning in the header comment for
extract_partition_clauses() to explain "Note: the 'clauses' List may
be modified inside this function. Callers may like to make a copy of
important lists before passing them to this function.", or something
like that...
7. "null" -> "nulls"
* Only allow strict operators. This will guarantee null are
8. "dicard" -> "discard"
* contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
Thanks for the review.
On 2018/02/21 19:15, David Rowley wrote:
Thanks for fixing. I made a pass over v31 and only see a few small things:
1. In get_partitions_for_keys() why is the
get_partitions_excluded_by_ne_datums call not part of
get_partitions_for_keys_list?
Hmm, there is a question of where exactly to put the call within
get_partitions_for_keys_list(). At the end would sound like an obvious
answer, but we tend to short-circuit return from that function at various
points, which it seems undesirable to change. So, I left things as is here.
2. Still a stray "minoff += 1;" in get_partitions_for_keys_range
I actually found a few and changed them to ++ or --, as applicable.
3. You're also preferring to minoff--/++, but maxoff -= 1/maxoff += 1;
would be nice to see the style unified here.
Fixed all as mentioned above.
4. "other other"
* that is, each of its fields other other than clauseinfo must be valid before
Fixed.
5. "a IS NULL" -> "an IS NULL":
* Based on a IS NULL or IS NOT NULL clause that was matched to a partition
Fixed.
6. Can you add a warning in the header comment for
extract_partition_clauses() to explain "Note: the 'clauses' List may
be modified inside this function. Callers may like to make a copy of
important lists before passing them to this function.", or something
like that...
At least in my patch, extract_partition_clauses() is a local function with
just one caller, but I still don't see any problem with warning the
reader. So, done.
7. "null" -> "nulls"
* Only allow strict operators. This will guarantee null are
8. "dicard" -> "discard"
* contains a <= 2, then because 3 <= 2 is false, we dicard a < 3 as
Fixed.
Please find attached updated patches.
Thanks,
Amit
Attachments:
v32-0001-Modify-bound-comparision-functions-to-accept-mem.patchtext/plain; charset=UTF-8; name=v32-0001-Modify-bound-comparision-functions-to-accept-mem.patchDownload
From 487712fb7f49da269b3097d1416d7061f5869289 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date: Thu, 6 Jul 2017 14:15:22 +0530
Subject: [PATCH v32 1/5] Modify bound comparision functions to accept members
of PartitionKey
Functions partition_rbound_cmp() and partition_rbound_datum_cmp() are
required to merge partition bounds from joining relations. While doing
so, we do not have access to the PartitionKey of either relations. So,
modify these functions to accept only required members of PartitionKey
so that the functions can be reused for merging bounds.
Ashutosh Bapat.
---
src/backend/catalog/partition.c | 53 ++++++++++++++++++++++++++++-------------
1 file changed, 36 insertions(+), 17 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index b1c7cd6c72..edf30bda61 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -165,10 +165,12 @@ static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
List *datums, bool lower);
static int32 partition_hbound_cmp(int modulus1, int remainder1, int modulus2,
int remainder2);
-static int32 partition_rbound_cmp(PartitionKey key,
- Datum *datums1, PartitionRangeDatumKind *kind1,
- bool lower1, PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(PartitionKey key,
+static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation, Datum *datums1,
+ PartitionRangeDatumKind *kind1, bool lower1,
+ PartitionRangeBound *b2);
+static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
@@ -1113,8 +1115,9 @@ check_new_partition_bound(char *relname, Relation parent,
* First check if the resulting range would be empty with
* specified lower and upper bounds
*/
- if (partition_rbound_cmp(key, lower->datums, lower->kind, true,
- upper) >= 0)
+ if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, lower->datums,
+ lower->kind, true, upper) >= 0)
{
ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -1174,7 +1177,10 @@ check_new_partition_bound(char *relname, Relation parent,
kind = boundinfo->kind[offset + 1];
is_lower = (boundinfo->indexes[offset + 1] == -1);
- cmpval = partition_rbound_cmp(key, datums, kind,
+ cmpval = partition_rbound_cmp(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ datums, kind,
is_lower, upper);
if (cmpval < 0)
{
@@ -2811,7 +2817,9 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
PartitionKey key = (PartitionKey) arg;
- return partition_rbound_cmp(key, b1->datums, b1->kind, b1->lower, b2);
+ return partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation, b1->datums, b1->kind,
+ b1->lower, b2);
}
/*
@@ -2820,6 +2828,10 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* Return for two range bounds whether the 1st one (specified in datums1,
* kind1, and lower1) is <, =, or > the bound specified in *b2.
*
+ * partnatts, partsupfunc and partcollation give number of attributes in the
+ * bounds to be compared, comparison function to be used and the collations of
+ * attributes resp.
+ *
* Note that if the values of the two range bounds compare equal, then we take
* into account whether they are upper or lower bounds, and an upper bound is
* considered to be smaller than a lower bound. This is important to the way
@@ -2828,7 +2840,7 @@ qsort_partition_rbound_cmp(const void *a, const void *b, void *arg)
* two contiguous partitions.
*/
static int32
-partition_rbound_cmp(PartitionKey key,
+partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
Datum *datums1, PartitionRangeDatumKind *kind1,
bool lower1, PartitionRangeBound *b2)
{
@@ -2838,7 +2850,7 @@ partition_rbound_cmp(PartitionKey key,
PartitionRangeDatumKind *kind2 = b2->kind;
bool lower2 = b2->lower;
- for (i = 0; i < key->partnatts; i++)
+ for (i = 0; i < partnatts; i++)
{
/*
* First, handle cases where the column is unbounded, which should not
@@ -2859,8 +2871,8 @@ partition_rbound_cmp(PartitionKey key,
*/
break;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
datums1[i],
datums2[i]));
if (cmpval != 0)
@@ -2884,9 +2896,14 @@ partition_rbound_cmp(PartitionKey key,
*
* Return whether range bound (specified in rb_datums, rb_kind, and rb_lower)
* is <, =, or > partition key of tuple (tuple_datums)
+ *
+ * n_tuple_datums, partsupfunc and partcollation give number of attributes in
+ * the bounds to be compared, comparison function to be used and the collations
+ * of attributes resp.
+ *
*/
static int32
-partition_rbound_datum_cmp(PartitionKey key,
+partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
{
@@ -2900,8 +2917,8 @@ partition_rbound_datum_cmp(PartitionKey key,
else if (rb_kind[i] == PARTITION_RANGE_DATUM_MAXVALUE)
return 1;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
- key->partcollation[i],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
+ partcollation[i],
rb_datums[i],
tuple_datums[i]));
if (cmpval != 0)
@@ -2978,7 +2995,8 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key,
+ cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3022,7 +3040,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key,
+ cmpval = partition_rbound_datum_cmp(key->partsupfunc,
+ key->partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
--
2.11.0
v32-0002-Refactor-partition-bound-search-functions.patchtext/plain; charset=UTF-8; name=v32-0002-Refactor-partition-bound-search-functions.patchDownload
From fc961dd4e9bd20d04cb9e4de506dae4ef01453d6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 8 Feb 2018 19:08:12 +0900
Subject: [PATCH v32 2/5] Refactor partition bound search functions
Remove the PartitionKey argument from their signature and instead
add provide the necessary information through other arguments.
---
src/backend/catalog/partition.c | 75 +++++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index edf30bda61..90e24ee8ec 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -174,22 +174,24 @@ static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums);
-static int partition_list_bsearch(PartitionKey key,
+static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal);
-static int partition_range_bsearch(PartitionKey key,
+static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(PartitionKey key,
+static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(PartitionKey key, Datum *values, bool *isnull);
+static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
/*
* RelationBuildPartitionDesc
@@ -1004,7 +1006,7 @@ check_new_partition_bound(char *relname, Relation parent,
* boundinfo->datums that is less than or equal to the
* (spec->modulus, spec->remainder) pair.
*/
- offset = partition_hash_bsearch(key, boundinfo,
+ offset = partition_hash_bsearch(boundinfo,
spec->modulus,
spec->remainder);
if (offset < 0)
@@ -1080,7 +1082,9 @@ check_new_partition_bound(char *relname, Relation parent,
int offset;
bool equal;
- offset = partition_list_bsearch(key, boundinfo,
+ offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
+ boundinfo,
val->constvalue,
&equal);
if (offset >= 0 && equal)
@@ -1155,7 +1159,10 @@ check_new_partition_bound(char *relname, Relation parent,
* since the index array is initialised with an extra -1
* at the end.
*/
- offset = partition_range_bsearch(key, boundinfo, lower,
+ offset = partition_range_bsearch(key->partnatts,
+ key->partsupfunc,
+ key->partcollation,
+ boundinfo, lower,
&equal);
if (boundinfo->indexes[offset + 1] < 0)
@@ -2574,7 +2581,9 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
int greatest_modulus = get_greatest_modulus(boundinfo);
- uint64 rowHash = compute_hash_value(key, values, isnull);
+ uint64 rowHash = compute_hash_value(key->partnatts,
+ key->partsupfunc,
+ values, isnull);
part_index = boundinfo->indexes[rowHash % greatest_modulus];
}
@@ -2590,7 +2599,8 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
{
bool equal = false;
- bound_offset = partition_list_bsearch(key,
+ bound_offset = partition_list_bsearch(key->partsupfunc,
+ key->partcollation,
partdesc->boundinfo,
values[0], &equal);
if (bound_offset >= 0 && equal)
@@ -2619,11 +2629,13 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
if (!range_partkey_has_null)
{
- bound_offset = partition_range_datum_bsearch(key,
- partdesc->boundinfo,
- key->partnatts,
- values,
- &equal);
+ bound_offset =
+ partition_range_datum_bsearch(key->partsupfunc,
+ key->partcollation,
+ partdesc->boundinfo,
+ key->partnatts,
+ values,
+ &equal);
/*
* The bound at bound_offset is less than or equal to the
* tuple value, so the bound at offset+1 is the upper
@@ -2937,7 +2949,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* to the input value.
*/
static int
-partition_list_bsearch(PartitionKey key,
+partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
{
@@ -2952,8 +2964,8 @@ partition_list_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
- key->partcollation[0],
+ cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
+ partcollation[0],
boundinfo->datums[mid][0],
value));
if (cmpval <= 0)
@@ -2980,7 +2992,8 @@ partition_list_bsearch(PartitionKey key,
* to the input range bound
*/
static int
-partition_range_bsearch(PartitionKey key,
+partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
PartitionBoundInfo boundinfo,
PartitionRangeBound *probe, bool *is_equal)
{
@@ -2995,8 +3008,7 @@ partition_range_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_cmp(key->partnatts, key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_cmp(partnatts, partsupfunc, partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
(boundinfo->indexes[mid] == -1),
@@ -3025,7 +3037,7 @@ partition_range_bsearch(PartitionKey key,
* to the input tuple.
*/
static int
-partition_range_datum_bsearch(PartitionKey key,
+partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
{
@@ -3040,8 +3052,8 @@ partition_range_datum_bsearch(PartitionKey key,
int32 cmpval;
mid = (lo + hi + 1) / 2;
- cmpval = partition_rbound_datum_cmp(key->partsupfunc,
- key->partcollation,
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
boundinfo->datums[mid],
boundinfo->kind[mid],
values,
@@ -3068,8 +3080,7 @@ partition_range_datum_bsearch(PartitionKey key,
* all of them are greater
*/
static int
-partition_hash_bsearch(PartitionKey key,
- PartitionBoundInfo boundinfo,
+partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
int lo,
@@ -3267,27 +3278,27 @@ get_greatest_modulus(PartitionBoundInfo bound)
* Compute the hash value for given not null partition key values.
*/
static uint64
-compute_hash_value(PartitionKey key, Datum *values, bool *isnull)
+compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull)
{
int i;
- int nkeys = key->partnatts;
uint64 rowHash = 0;
Datum seed = UInt64GetDatum(HASH_PARTITION_SEED);
- for (i = 0; i < nkeys; i++)
+ for (i = 0; i < partnatts; i++)
{
if (!isnull[i])
{
Datum hash;
- Assert(OidIsValid(key->partsupfunc[i].fn_oid));
+ Assert(OidIsValid(partsupfunc[i].fn_oid));
/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
- hash = FunctionCall2(&key->partsupfunc[i], values[i], seed);
+ hash = FunctionCall2(&partsupfunc[i], values[i], seed);
/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
--
2.11.0
v32-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchtext/plain; charset=UTF-8; name=v32-0003-Add-parttypid-partcollation-partsupfunc-to-Parti.patchDownload
From 3be19ce59123302b65a3fb13a84a92e52f3a0235 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v32 3/5] Add parttypid, partcollation, partsupfunc to
PartitionScheme
---
src/backend/optimizer/util/plancat.c | 43 +++++++++++++++++++++++++-----------
src/include/nodes/relation.h | 9 ++++++++
2 files changed, 39 insertions(+), 13 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..dcfc1665a8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1887,22 +1887,26 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
continue;
/* Match the partition key types. */
- if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
+ if (memcmp(partkey->parttypid, part_scheme->parttypid,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
/*
- * Length and byval information should match when partopcintype
+ * typlen, typbyval, typcoll information should match when typid
* matches.
*/
Assert(memcmp(partkey->parttyplen, part_scheme->parttyplen,
sizeof(int16) * partnatts) == 0);
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ Assert(memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(bool) * partnatts) == 0);
/* Found matching partition scheme. */
return part_scheme;
@@ -1918,16 +1922,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
part_scheme->strategy = partkey->strategy;
part_scheme->partnatts = partkey->partnatts;
- part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopfamily, partkey->partopfamily,
- sizeof(Oid) * partnatts);
-
- part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->partopcintype, partkey->partopcintype,
- sizeof(Oid) * partnatts);
-
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->parttypid = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypid, partkey->parttypid,
sizeof(Oid) * partnatts);
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
@@ -1938,6 +1934,27 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopfamily = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopfamily, partkey->partopfamily,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partopcintype = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partopcintype, partkey->partopcintype,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..ce9975c620 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -342,6 +343,9 @@ typedef struct PlannerInfo
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
+ *
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
*/
typedef struct PartitionSchemeData
{
@@ -350,10 +354,15 @@ typedef struct PartitionSchemeData
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collation */
/* Cached information about partition key data types. */
+ Oid *parttypid;
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Array of partition key comparison function pointers */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v32-0004-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v32-0004-Faster-partition-pruning.patchDownload
From 33b2d8ff77a93b0f03bcd77b4146d14fa42235e5 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v32 4/5] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
Dilip Kumar (dilipbalaut@gmail.com),
---
src/backend/catalog/partition.c | 669 +++++++++++
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1523 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 92 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 486 +++++++-
src/test/regress/sql/partition_prune.sql | 102 +-
15 files changed, 2913 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 90e24ee8ec..b9e9b68abe 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,15 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1569,669 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+
+ /* Some partitions might have to be removed from result */
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of any partitions which cannot possibly
+ * contain rows matching the clauses due to key->ne_datums
+ * containing all datum values which are allowed in the given
+ * partition. This is only possible to do in LIST partitioning
+ * as it's the only partitioning strategy which allows the
+ * specification of exact values.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ result = NULL;
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ i = -1;
+ while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
+ {
+ keys->n_eqkeys++;
+ Assert(i < partnatts);
+ keyisnull[i] = true;
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ *
+ * Note: LIST partitioning only supports a single partition key, therefore
+ * this function requires no looping over the partition keys.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Handle clauses requesting a NULL valued partition key */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * minkeys[0] must be greater than or equal to the smallest datum.
+ * If we didn't find an exact matching datum (!is_equal) or if the
+ * operator used was non-inclusive (>), then in both of these
+ * cases we're not interested in the datum pointed to by minoff,
+ * but we may start getting matches in the partition which the
+ * next datum belongs to, so point to that one instead. (This may
+ * be beyond the last datum in the array, but we'll detect that
+ * later.)
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys[0],
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * The value of minkeys[0] is greater than all of the datums we have
+ * partitions for. The only possible partition that could contain
+ * a match is the default partition. Return that, if it exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * maxkeys[0] must be greater than or equal to the smallest datum.
+ * If the match found is an equal match, but the operator used is
+ * non-inclusive of that value (<), then the partition belonging
+ * to maxoff cannot match, so we'll decrement maxoff to point to
+ * the partition belonging to the previous datum. We might end up
+ * decrementing maxoff down to -1, but we'll handle that later.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous manner,
+ * not all values in the given range will have a partition assigned. This
+ * may not technically be true for some data types (e.g. integer types),
+ * however, we currently lack any sort of infrastructure to provide us
+ * with proofs that would allow us to do anything smarter here.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Only the default range partition accepts nulls. */
+ if (!bms_is_empty(keys->keyisnull))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_range_datum_bsearch works. Considering it as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_range_datum_bsearch would've returned the offset of just
+ * one of those. If minkey is inclusive, we must decrement minoff
+ * until it reaches the leftmost of those bound values, so that
+ * partitions corresponding to all those bound values are selected.
+ * If minkeys is exclusive, we must increment minoff until it reaches
+ * the first bound greater than this prefix, so that none of the
+ * partitions corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff++;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff--;
+ else
+ minoff++;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff++;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff--;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff++;
+ else
+ maxoff--;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff++;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff++;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, minoff/maxoff supposedly point to the upper bound of
+ * some partition, but it may not be the case. It might actually be the
+ * upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+
+ if (!partition_bound_has_default(boundinfo))
+ return result;
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ return bms_add_member(result, boundinfo->default_index);
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (bms_num_members(keys->keyisnotnull) < partnatts)
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f714247ebb..a9eba3a831 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Relids live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..e1f320b264
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1523 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Following entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * efforts only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. Caller must have set up a
+ * valid partition pruning context in the form of struct PartitionPruneContext,
+ * that is, each of its fields other than clauseinfo must be valid before
+ * calling here. After extracting relevant clauses, clauseinfo is filled with
+ * information that will be used for actual pruning.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on relevant partitioning
+ * clauses. Caller must have called generate_partition_clauses() at least
+ * once and hence a valid partition pruning context must have already been
+ * created. Especially, PartitionPruneContext.clauseinfo must contain valid
+ * information. Partition pruning proceeds by extracting constant values
+ * from the clauses and comparing it with the partition bounds while also
+ * taking into account strategies of the operators in the matched clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on an IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set. A bit
+ * in 'keyisnotnull' may also be set when a strict OpExpr is encountered for
+ * the given partition key.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static void extract_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+static bool match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.parttypid = rel->part_scheme->parttypid;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses; context.clauseinfo will be set */
+ generate_partition_clauses(&context, clauses);
+
+ if (!context.clauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes = get_partitions_from_clauses(&context);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Analyzes clauses to find those that match the partition key and sets
+ * context->clauseinfo
+ *
+ * Ideally, this should be called only once for a given query and a given
+ * partitioned table.
+ */
+void
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* And away we go to do the real work; context->clauseinfo will be set */
+ extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its slot in minimalclauses
+ * with the most restrictive of the clauses from the corresponding
+ * list in context->clauseinfo.
+ */
+ remove_redundant_clauses(context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in context->clauseinfo. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of important lists before passing them to this
+ * function.
+ */
+static void
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ context->clauseinfo = partclauseinfo = palloc(sizeof(PartitionClauseInfo));
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (IsBooleanOpfamily(partopfamily))
+ {
+ Expr *rightop;
+
+ if (match_boolean_partition_clause(clause, partkey, &rightop))
+ {
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ continue;
+ }
+ }
+
+ if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ break;
+ }
+ else
+ /* clause does not match this partition key. */
+ continue;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * break instead of continuing to match the clause with the
+ * next key, because the clause is useless no matter which key
+ * it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ break;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ break;
+
+ /*
+ * Normally we only bother with operators that are listed as
+ * being part of the partitioning operator family. But we
+ * make an exception in one case -- operators named '<>' are
+ * not listed in any operator family whatsoever, in which
+ * case, we try to perform partition pruning with it only if
+ * list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_ne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_ne_listp)
+ break;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ break;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ break;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ break;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ break;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_from_or_args
+ *
+ * Returns the set of indexes of partitions, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ /*
+ * All fields except clauseinfo are same as in the parent context,
+ * which will be set by calling extract_partition_clauses().
+ */
+ memcpy(&subcontext, context, sizeof(PartitionPruneContext));
+ extract_partition_clauses(&subcontext, clauses);
+
+ if (!subcontext.clauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!subcontext.clauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(&subcontext);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets
+ * constfalse in context->clauseinfo to inform the caller that we found such
+ * clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and pc is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and pc is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and pc is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* pc is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we discard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->parttypid[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(parttypid, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(parttypid, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->parttypid[i], value,
+ &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clauses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Collect datums from <> operator clauses in its dedicated array. */
+ if (clauseinfo->ne_clauses)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ keys->ne_datums = (Datum *)
+ palloc(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->parttypid[0], pc->value,
+ &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != parttypid)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ parttypid, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index dcfc1665a8..f3063be6d9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..ed27ca921e 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,94 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *parttypid;
+ Oid *partopfamily;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+
+ /* Information about matched clauses */
+ struct PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Datum arrays eqkeys, minkeys, and maxkeys are indexed by
+ * partition key number, whereas ne_datums is not. Bitmapsets keyisnull and
+ * keyisnotnull have a bit for each partition key.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses corresponding to the datums stored in
+ * minkeys and maxkeys, respectively, are inclusive of the stored value or
+ * not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Datum values from clauses containing <> operator. Note that, unlike
+ * the arrays above, the following array is not indexed by partition
+ * key. We only ever use this array for list partitioning and there
+ * can only be one partition key in that case anyway.
+ */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +161,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ce9975c620..5ee23a5bb5 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -538,6 +538,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -666,6 +668,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..2b84ed90bf
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern void generate_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..948cad4c3d 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,411 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(5 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(3 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..08fc2dbc21 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,104 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
--
2.11.0
v32-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v32-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 90c4094666726ce1fe1fe6293e7b234b47479a55 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v32 5/5] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 107 insertions(+), 220 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 82255b0d1d..1bb76dd4f5 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2260,21 +2260,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5039,9 +5024,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b9bc8e38d7..cf381573e9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3184,9 +3174,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 011d2a3fa9..fe309a6b54 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4072,9 +4063,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a9eba3a831..17eae105ec 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447c..8fa90b1f48 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -559,7 +559,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -574,6 +573,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1116,12 +1116,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1193,10 +1193,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1427,6 +1429,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1527,6 +1533,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1534,7 +1555,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5931,65 +5952,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..c097da6425 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5ee23a5bb5..5579940d98 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -252,8 +252,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -318,6 +316,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -542,6 +543,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -674,6 +678,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2126,27 +2131,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
On 21 February 2018 at 23:44, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Please find attached updated patches.
Thanks for updating the code.
The question I have now is around NULL handling in
partkey_datum_from_expr(). I've not managed to find a way to get a
NULL Const in there as it seems all the clauses I try get removed
somewhere earlier in planning. Do you know for a fact that a NULL
Const is impossible to get there?
I'm having to add some NULL handling there for the run-time pruning
patch but wondered if it was also required for your patch.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/02/22 17:41, David Rowley wrote:
On 21 February 2018 at 23:44, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Please find attached updated patches.
Thanks for updating the code.
The question I have now is around NULL handling in
partkey_datum_from_expr(). I've not managed to find a way to get a
NULL Const in there as it seems all the clauses I try get removed
somewhere earlier in planning. Do you know for a fact that a NULL
Const is impossible to get there?
We only ever call partkey_datum_from_expr() for an OpExpr's arg and if you
have a NULL Const in there, eval_const_expressions() would've folded the
OpExpr's and subsequently any AND'd OpExpr's into a constant-false qual.
create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
explain select * from p where a = null and a = 1;
QUERY PLAN
-------------------------------------------
Result (cost=0.00..0.00 rows=0 width=40)
One-Time Filter: false
explain select * from p where (a = null and a = 1) or a = 2;
QUERY PLAN
----------------------------------------------------------
Append (cost=0.00..41.94 rows=13 width=4)
-> Seq Scan on p2 (cost=0.00..41.88 rows=13 width=4)
Filter: (a = 2)
(3 rows)
I'm having to add some NULL handling there for the run-time pruning
patch but wondered if it was also required for your patch.
Hmm, not sure why. Can you explain a bit more?
Thanks,
Amit
On 22 February 2018 at 22:48, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I'm having to add some NULL handling there for the run-time pruning
patch but wondered if it was also required for your patch.Hmm, not sure why. Can you explain a bit more?
hmm, yeah, but perhaps we should be discussing on the other thread...
With a prepared statement the Param will be unavailable until
execution, in which case we don't do the const folding.
A simple case is:
create table listp (a int) partition by list (a);
create table listp1 partition of listp for values in(1);
prepare q1 (int) as select * from listp where a = $1;
explain analyze execute q1(1); -- repeat 5 times.
explain analyze execute q1(null); -- partkey_datum_from_expr() gets a
NULL param via the call from nodeAppend.c
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/02/22 20:28, David Rowley wrote:
On 22 February 2018 at 22:48, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I'm having to add some NULL handling there for the run-time pruning
patch but wondered if it was also required for your patch.Hmm, not sure why. Can you explain a bit more?
hmm, yeah, but perhaps we should be discussing on the other thread...
With a prepared statement the Param will be unavailable until
execution, in which case we don't do the const folding.
Ah right.
A simple case is:
create table listp (a int) partition by list (a);
create table listp1 partition of listp for values in(1);
prepare q1 (int) as select * from listp where a = $1;
explain analyze execute q1(1); -- repeat 5 times.
explain analyze execute q1(null); -- partkey_datum_from_expr() gets a
NULL param via the call from nodeAppend.c
I wonder if NULLs should somehow be managed at a higher level, resulting
in the same behavior as const-folding in the optimizer produces? In any
case, I suppose that would be something for the run-time pruning patch to
handle.
Thanks,
Amit
On Wed, Feb 21, 2018 at 5:44 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Please find attached updated patches.
Committed 0001 and 0002.
I'm having some difficulty wrapping my head around 0003 because it has
minimal comments and no useful commit message. I think, though, that
it's actually broken. Pre-patch, the logic in find_partition_scheme
compares partopfamily, partopcintype, and parttypcoll and then asserts
equality for parttyplen and parttypbyval; not coincidentally,
PartitionSchemeData describes the latter two fields only as "cached
data", so that the segregation of fields in PartitionSchemeData into
two groups exactly matches what find_partition_scheme is actually
doing. However, with the patch, it turns into a sort of hodgepodge.
parttypid is added into the "cached" section of PartitionSchemeData
and partcollation to the primary section, but both values are
compared, not asserted; parttypcoll moves from the "compared" section
to the "asserted" section but the declaration in PartitionSchemeData
stays where it was.
Moreover, there's no explanation of why this is getting changed.
There's an existing comment that explains the motivation for what the
code does today, which the patch does not modify:
* We store the opclass-declared input data types instead of the partition key
* datatypes since the former rather than the latter are used to compare
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.
Obviously, this raises the issue of whether changing this is really
the right thing to do in the first place, but at any rate it's
certainly necessary for the comments to match what the code actually
does.
Also, I find this not very helpful:
+ * The collation of the partition key can differ from the collation of the
+ * underlying column, so we must store this separately.
If the comments about parttypcol and partcollation were clear enough
(and I think they could use some work to distinguish them better),
then this would be pretty much unnecessary -- clearly the only reason
to store two things is if they might be different from each other.
It might be more useful to somehow explain how parttypid and
partsupfunc are intended to be work/be used, but actually I don't
think any satisfactory explanation is possible. Either we have one
partition scheme per partopcintype -- in which case parttypid is
ill-defined because it could vary among relations with the same
PartitionScheme -- or we have on per parttypid -- in which case,
without some other change, partition-wise join will stop working
between relations with different parttypids but the same
partopcintype.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018/02/23 23:46, Robert Haas wrote:
On Wed, Feb 21, 2018 at 5:44 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Please find attached updated patches.
Committed 0001 and 0002.
Thank you for committing and for the review.
I'm having some difficulty wrapping my head around 0003 because it has
minimal comments and no useful commit message. I think, though, that
it's actually broken. Pre-patch, the logic in find_partition_scheme
compares partopfamily, partopcintype, and parttypcoll and then asserts
equality for parttyplen and parttypbyval; not coincidentally,
PartitionSchemeData describes the latter two fields only as "cached
data", so that the segregation of fields in PartitionSchemeData into
two groups exactly matches what find_partition_scheme is actually
doing. However, with the patch, it turns into a sort of hodgepodge.
parttypid is added into the "cached" section of PartitionSchemeData
and partcollation to the primary section, but both values are
compared, not asserted; parttypcoll moves from the "compared" section
to the "asserted" section but the declaration in PartitionSchemeData
stays where it was.Moreover, there's no explanation of why this is getting changed.
There's an existing comment that explains the motivation for what the
code does today, which the patch does not modify:* We store the opclass-declared input data types instead of the partition key
* datatypes since the former rather than the latter are used to compare
* partition bounds. Since partition key data types and the opclass declared
* input data types are expected to be binary compatible (per ResolveOpClass),
* both of those should have same byval and length properties.Obviously, this raises the issue of whether changing this is really
the right thing to do in the first place, but at any rate it's
certainly necessary for the comments to match what the code actually
does.Also, I find this not very helpful:
+ * The collation of the partition key can differ from the collation of the + * underlying column, so we must store this separately.If the comments about parttypcol and partcollation were clear enough
(and I think they could use some work to distinguish them better),
then this would be pretty much unnecessary -- clearly the only reason
to store two things is if they might be different from each other.It might be more useful to somehow explain how parttypid and
partsupfunc are intended to be work/be used, but actually I don't
think any satisfactory explanation is possible. Either we have one
partition scheme per partopcintype -- in which case parttypid is
ill-defined because it could vary among relations with the same
PartitionScheme -- or we have on per parttypid -- in which case,
without some other change, partition-wise join will stop working
between relations with different parttypids but the same
partopcintype.
I think I'm convinced that partopcintype OIDs can be used where I thought
parttypid ones were necessary. The pruning patch uses the respective OID
from this array when extracting the datum from an OpExpr to be compared
with the partition bound datums. It's sensible, I now think, to require
the extracted datum to be of partition opclass declared input type, rather
than the type of the partition key involved. So, I removed the parttypid
that I'd added to PartitionSchemeData.
Updated the comments to make clear the distinction between and purpose of
having both parttypcoll and partcollation. Also expanded the comment
about partsupfunc a bit.
Attached updated patches.
Thanks,
Amit
Attachments:
v33-0001-Add-partcollation-and-partsupfunc-to-PartitionSc.patchtext/plain; charset=UTF-8; name=v33-0001-Add-partcollation-and-partsupfunc-to-PartitionSc.patchDownload
From 5a2376376b49aeff814d2b1e473394b2377a6370 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v33 1/3] Add partcollation and partsupfunc to
PartitionSchemeData
Partitioning-specific collation denoted by partcollation may be
different from the collation of the partition key's type. When
performing partitioning-specific optimizations within the planner,
such as when determining which partitions to prune based on a given
operator clause, we must use that collation. Also, two logically
identical partition schemes must have the same partcollation.
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 15 ++++++++++++++-
src/include/nodes/relation.h | 19 ++++++++++++++++++-
2 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..fdcbc2f513 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1886,12 +1886,16 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /*
+ * Match the partition key types and partitioning-specific collations.
+ */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
@@ -1930,6 +1934,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
@@ -1938,6 +1946,11 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..0d918b5643 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -349,11 +350,27 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+
+ /*
+ * We store both the collation implied by the partition key's type and the
+ * one specified for partitioning. Values in the former are used as
+ * varcollid in the Vars corresponding to simple column partition keys so
+ * as to make them match corresponding Vars appearing elsewhere in the
+ * query tree. Whereas, the latter is used when actually comparing values
+ * against partition bounds datums, such as, when doing partition pruning.
+ */
+ Oid *parttypcoll;
+ Oid *partcollation;
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /*
+ * Cached array of partitioning comparison functions' fmgr structs. We
+ * don't compare these when trying to match two partition schemes.
+ */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v33-0002-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v33-0002-Faster-partition-pruning.patchDownload
From 95cd3ac12085f41833ce60b27c2cf001f195b57d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v33 2/3] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
Dilip Kumar (dilipbalaut@gmail.com),
---
src/backend/catalog/partition.c | 669 +++++++++++
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1523 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 92 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 486 +++++++-
src/test/regress/sql/partition_prune.sql | 102 +-
15 files changed, 2913 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f8c9a11493..6a2761c350 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,15 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1569,669 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+
+ /* Some partitions might have to be removed from result */
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of any partitions which cannot possibly
+ * contain rows matching the clauses due to key->ne_datums
+ * containing all datum values which are allowed in the given
+ * partition. This is only possible to do in LIST partitioning
+ * as it's the only partitioning strategy which allows the
+ * specification of exact values.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ result = NULL;
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ i = -1;
+ while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
+ {
+ keys->n_eqkeys++;
+ Assert(i < partnatts);
+ keyisnull[i] = true;
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ *
+ * Note: LIST partitioning only supports a single partition key, therefore
+ * this function requires no looping over the partition keys.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Handle clauses requesting a NULL valued partition key */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * minkeys[0] must be greater than or equal to the smallest datum.
+ * If we didn't find an exact matching datum (!is_equal) or if the
+ * operator used was non-inclusive (>), then in both of these
+ * cases we're not interested in the datum pointed to by minoff,
+ * but we may start getting matches in the partition which the
+ * next datum belongs to, so point to that one instead. (This may
+ * be beyond the last datum in the array, but we'll detect that
+ * later.)
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys[0],
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * The value of minkeys[0] is greater than all of the datums we have
+ * partitions for. The only possible partition that could contain
+ * a match is the default partition. Return that, if it exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * maxkeys[0] must be greater than or equal to the smallest datum.
+ * If the match found is an equal match, but the operator used is
+ * non-inclusive of that value (<), then the partition belonging
+ * to maxoff cannot match, so we'll decrement maxoff to point to
+ * the partition belonging to the previous datum. We might end up
+ * decrementing maxoff down to -1, but we'll handle that later.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous manner,
+ * not all values in the given range will have a partition assigned. This
+ * may not technically be true for some data types (e.g. integer types),
+ * however, we currently lack any sort of infrastructure to provide us
+ * with proofs that would allow us to do anything smarter here.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Only the default range partition accepts nulls. */
+ if (!bms_is_empty(keys->keyisnull))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_range_datum_bsearch works. Considering it as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_range_datum_bsearch would've returned the offset of just
+ * one of those. If minkey is inclusive, we must decrement minoff
+ * until it reaches the leftmost of those bound values, so that
+ * partitions corresponding to all those bound values are selected.
+ * If minkeys is exclusive, we must increment minoff until it reaches
+ * the first bound greater than this prefix, so that none of the
+ * partitions corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff++;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff--;
+ else
+ minoff++;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff++;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff--;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff++;
+ else
+ maxoff--;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff++;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff++;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, minoff/maxoff supposedly point to the upper bound of
+ * some partition, but it may not be the case. It might actually be the
+ * upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+
+ if (!partition_bound_has_default(boundinfo))
+ return result;
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ return bms_add_member(result, boundinfo->default_index);
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (bms_num_members(keys->keyisnotnull) < partnatts)
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f714247ebb..a9eba3a831 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Relids live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..d8e126f21c
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1523 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Following entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * efforts only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. Caller must have set up a
+ * valid partition pruning context in the form of struct PartitionPruneContext,
+ * that is, each of its fields other than clauseinfo must be valid before
+ * calling here. After extracting relevant clauses, clauseinfo is filled with
+ * information that will be used for actual pruning.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on relevant partitioning
+ * clauses. Caller must have called generate_partition_clauses() at least
+ * once and hence a valid partition pruning context must have already been
+ * created. Especially, PartitionPruneContext.clauseinfo must contain valid
+ * information. Partition pruning proceeds by extracting constant values
+ * from the clauses and comparing it with the partition bounds while also
+ * taking into account strategies of the operators in the matched clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on an IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set. A bit
+ * in 'keyisnotnull' may also be set when a strict OpExpr is encountered for
+ * the given partition key.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each members is a List itself of a given OR clauses's arguments. */
+ List *or_clauses;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static void extract_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+static bool match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses; context.clauseinfo will be set */
+ generate_partition_clauses(&context, clauses);
+
+ if (!context.clauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes = get_partitions_from_clauses(&context);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Analyzes clauses to find those that match the partition key and sets
+ * context->clauseinfo
+ *
+ * Ideally, this should be called only once for a given query and a given
+ * partitioned table.
+ */
+void
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* And away we go to do the real work; context->clauseinfo will be set */
+ extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in context->clauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context)
+{
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its slot in minimalclauses
+ * with the most restrictive of the clauses from the corresponding
+ * list in context->clauseinfo.
+ */
+ remove_redundant_clauses(context, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, minimalclauses, &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = (List *) lfirst(lc);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * This adds matched clauses to the list corresponding to particular key
+ * in context->clauseinfo. Also collects other useful clauses to assist
+ * in partition elimination, such as OR clauses, clauses containing <>
+ * operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of important lists before passing them to this
+ * function.
+ */
+static void
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ context->clauseinfo = partclauseinfo = palloc(sizeof(PartitionClauseInfo));
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (IsBooleanOpfamily(partopfamily))
+ {
+ Expr *rightop;
+
+ if (match_boolean_partition_clause(clause, partkey, &rightop))
+ {
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ continue;
+ }
+ }
+
+ if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ break;
+ }
+ else
+ /* clause does not match this partition key. */
+ continue;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * break instead of continuing to match the clause with the
+ * next key, because the clause is useless no matter which key
+ * it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ break;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ break;
+
+ /*
+ * Normally we only bother with operators that are listed as
+ * being part of the partitioning operator family. But we
+ * make an exception in one case -- operators named '<>' are
+ * not listed in any operator family whatsoever, in which
+ * case, we try to perform partition pruning with it only if
+ * list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_ne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_ne_listp)
+ break;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ break;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ break;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ break;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ break;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_from_or_args
+ *
+ * Returns the set of indexes of partitions, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ foreach(lc, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc));
+ PartitionPruneContext subcontext;
+ Bitmapset *arg_partset;
+
+ /*
+ * All fields except clauseinfo are same as in the parent context,
+ * which will be set by calling extract_partition_clauses().
+ */
+ memcpy(&subcontext, context, sizeof(PartitionPruneContext));
+ extract_partition_clauses(&subcontext, clauses);
+
+ if (!subcontext.clauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!subcontext.clauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(&subcontext);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Processes the clauses contained in context->clauseinfo to remove the
+ * ones that are superseeded by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets
+ * constfalse in context->clauseinfo to inform the caller that we found such
+ * clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ PartitionClauseInfo *partclauseinfo = context->clauseinfo;
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and pc is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and pc is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and pc is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* pc is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we discard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid partopcintype, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(partopcintype, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(partopcintype, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process clauses in context->clauseinfo and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ PartitionClauseInfo *clauseinfo = context->clauseinfo;
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clauses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Collect datums from <> operator clauses in its dedicated array. */
+ if (clauseinfo->ne_clauses)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ keys->ne_datums = (Datum *)
+ palloc(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->partopcintype[0],
+ pc->value, &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid partopcintype, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != partopcintype)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ partopcintype, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index fdcbc2f513..6eba13c244 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..d24f64e087 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,94 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+
+ /* Information about matched clauses */
+ struct PartitionClauseInfo *clauseinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Datum arrays eqkeys, minkeys, and maxkeys are indexed by
+ * partition key number, whereas ne_datums is not. Bitmapsets keyisnull and
+ * keyisnotnull have a bit for each partition key.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses corresponding to the datums stored in
+ * minkeys and maxkeys, respectively, are inclusive of the stored value or
+ * not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Datum values from clauses containing <> operator. Note that, unlike
+ * the arrays above, the following array is not indexed by partition
+ * key. We only ever use this array for list partitioning and there
+ * can only be one partition key in that case anyway.
+ */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +161,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 0d918b5643..d9b1a433b2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -546,6 +546,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -674,6 +676,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..2b84ed90bf
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern void generate_partition_clauses(PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..948cad4c3d 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,411 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(5 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(3 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..08fc2dbc21 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,104 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
--
2.11.0
v33-0003-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v33-0003-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From a7d7418c65166695a499e1485408335539659121 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v33 3/3] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 107 insertions(+), 220 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 266a3ef8ef..169c697c08 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2260,21 +2260,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5040,9 +5025,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index bbffc87842..2021c085d5 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3185,9 +3175,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 011d2a3fa9..fe309a6b54 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4072,9 +4063,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a9eba3a831..17eae105ec 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447c..8fa90b1f48 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -559,7 +559,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -574,6 +573,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1116,12 +1116,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1193,10 +1193,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1427,6 +1429,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1527,6 +1533,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1534,7 +1555,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5931,65 +5952,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..c097da6425 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d9b1a433b2..b6950a197e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -252,8 +252,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -318,6 +316,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -550,6 +551,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -682,6 +686,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2134,27 +2139,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
On Sun, Feb 25, 2018 at 11:10 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I think I'm convinced that partopcintype OIDs can be used where I thought
parttypid ones were necessary. The pruning patch uses the respective OID
from this array when extracting the datum from an OpExpr to be compared
with the partition bound datums. It's sensible, I now think, to require
the extracted datum to be of partition opclass declared input type, rather
than the type of the partition key involved. So, I removed the parttypid
that I'd added to PartitionSchemeData.Updated the comments to make clear the distinction between and purpose of
having both parttypcoll and partcollation. Also expanded the comment
about partsupfunc a bit.
I don't think this fundamentally fixes the problem, although it does
narrow it. By requiring partcollation to match across every relation
with the same PartitionScheme, you're making partition-wise join fail
to work in some cases where it previously did. Construct a test case
where parttypcoll matches and partcollation doesn't; then, without the
patch, the two relations will have the same PartitionScheme and thus
be eligible for a partition-wise join, but with the patch, they will
have different PartitionSchemes and thus won't.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018/02/27 3:27, Robert Haas wrote:
On Sun, Feb 25, 2018 at 11:10 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:I think I'm convinced that partopcintype OIDs can be used where I thought
parttypid ones were necessary. The pruning patch uses the respective OID
from this array when extracting the datum from an OpExpr to be compared
with the partition bound datums. It's sensible, I now think, to require
the extracted datum to be of partition opclass declared input type, rather
than the type of the partition key involved. So, I removed the parttypid
that I'd added to PartitionSchemeData.Updated the comments to make clear the distinction between and purpose of
having both parttypcoll and partcollation. Also expanded the comment
about partsupfunc a bit.I don't think this fundamentally fixes the problem, although it does
narrow it. By requiring partcollation to match across every relation
with the same PartitionScheme, you're making partition-wise join fail
to work in some cases where it previously did. Construct a test case
where parttypcoll matches and partcollation doesn't; then, without the
patch, the two relations will have the same PartitionScheme and thus
be eligible for a partition-wise join, but with the patch, they will
have different PartitionSchemes and thus won't.
I may be confused but shouldn't two tables partitioned on the same column
(of the same collatable type), but using different collations for
partitioning should end up with different PartitionSchemes? Different
partitioning collations would mean that same data may end up in different
partitions of the respective tables.
create table p (a text) partition by range (a collate "en_US");
create table p1 partition of p for values from ('a') to ('m');
create table p2 partition of p for values from ('m') to ('z ');
create table q (a text) partition by range (a collate "C");
create table q1 partition of q for values from ('a') to ('m');
create table q2 partition of q for values from ('m') to ('z ');
insert into p values ('A');
INSERT 0 1
insert into q values ('A');
ERROR: no partition of relation "q" found for row
DETAIL: Partition key of the failing row contains (a) = (A).
You may say that partition bounds might have to be different too in this
case and hence partition-wise join won't occur anyway, but I'm wondering
if the mismatch of partcollation itself isn't enough to conclude that?
Thanks,
Amit
Attached an updated version in which I incorporated some of the revisions
that David Rowley suggested to OR clauses handling (in partprune.c) that
he posted as a separate patch on the run-time pruning thread [1]/messages/by-id/CAKJS1f8+p-mXfFUiwR4xZ37STvgJeYF44yAjo5Rfxf92cFJyYQ@mail.gmail.com.
Thanks,
Amit
[1]: /messages/by-id/CAKJS1f8+p-mXfFUiwR4xZ37STvgJeYF44yAjo5Rfxf92cFJyYQ@mail.gmail.com
/messages/by-id/CAKJS1f8+p-mXfFUiwR4xZ37STvgJeYF44yAjo5Rfxf92cFJyYQ@mail.gmail.com
Attachments:
v34-0001-Add-partcollation-and-partsupfunc-to-PartitionSc.patchtext/plain; charset=UTF-8; name=v34-0001-Add-partcollation-and-partsupfunc-to-PartitionSc.patchDownload
From bbe4420962e0567384496ddee830eb49ca471cc4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v34 1/5] Add partcollation and partsupfunc to
PartitionSchemeData
Partitioning-specific collation denoted by partcollation may be
different from the collation of the partition key's type. When
performing partitioning-specific optimizations within the planner,
such as when determining which partitions to prune based on a given
operator clause, we must use that collation. Also, two logically
identical partition schemes must have the same partcollation.
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 15 ++++++++++++++-
src/include/nodes/relation.h | 19 ++++++++++++++++++-
2 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..fdcbc2f513 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1886,12 +1886,16 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /*
+ * Match the partition key types and partitioning-specific collations.
+ */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ sizeof(Oid) * partnatts) != 0 ||
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
@@ -1930,6 +1934,10 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
sizeof(Oid) * partnatts);
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
+ sizeof(Oid) * partnatts);
+
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
memcpy(part_scheme->parttyplen, partkey->parttyplen,
sizeof(int16) * partnatts);
@@ -1938,6 +1946,11 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index db8de2dfd0..c974788a22 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -351,11 +352,27 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+
+ /*
+ * We store both the collation implied by the partition key's type and the
+ * one specified for partitioning. Values in the former are used as
+ * varcollid in the Vars corresponding to simple column partition keys so
+ * as to make them match corresponding Vars appearing elsewhere in the
+ * query tree. Whereas, the latter is used when actually comparing values
+ * against partition bounds datums, such as, when doing partition pruning.
+ */
+ Oid *parttypcoll;
+ Oid *partcollation;
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /*
+ * Cached array of partitioning comparison functions' fmgr structs. We
+ * don't compare these when trying to match two partition schemes.
+ */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v34-0002-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v34-0002-Faster-partition-pruning.patchDownload
From 57649d2b0983bf2d0ff1cac537be09060226ac96 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v34 2/5] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
Dilip Kumar (dilipbalaut@gmail.com),
---
src/backend/catalog/partition.c | 669 +++++++++++
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1517 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 89 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 71 ++
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 486 +++++++-
src/test/regress/sql/partition_prune.sql | 102 +-
15 files changed, 2950 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index f8c9a11493..6a2761c350 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,15 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1569,669 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+
+ /* Some partitions might have to be removed from result */
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of any partitions which cannot possibly
+ * contain rows matching the clauses due to key->ne_datums
+ * containing all datum values which are allowed in the given
+ * partition. This is only possible to do in LIST partitioning
+ * as it's the only partitioning strategy which allows the
+ * specification of exact values.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ result = NULL;
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ i = -1;
+ while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
+ {
+ keys->n_eqkeys++;
+ Assert(i < partnatts);
+ keyisnull[i] = true;
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ *
+ * Note: LIST partitioning only supports a single partition key, therefore
+ * this function requires no looping over the partition keys.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Handle clauses requesting a NULL valued partition key */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * minkeys[0] must be greater than or equal to the smallest datum.
+ * If we didn't find an exact matching datum (!is_equal) or if the
+ * operator used was non-inclusive (>), then in both of these
+ * cases we're not interested in the datum pointed to by minoff,
+ * but we may start getting matches in the partition which the
+ * next datum belongs to, so point to that one instead. (This may
+ * be beyond the last datum in the array, but we'll detect that
+ * later.)
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys[0],
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * The value of minkeys[0] is greater than all of the datums we have
+ * partitions for. The only possible partition that could contain
+ * a match is the default partition. Return that, if it exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * maxkeys[0] must be greater than or equal to the smallest datum.
+ * If the match found is an equal match, but the operator used is
+ * non-inclusive of that value (<), then the partition belonging
+ * to maxoff cannot match, so we'll decrement maxoff to point to
+ * the partition belonging to the previous datum. We might end up
+ * decrementing maxoff down to -1, but we'll handle that later.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous manner,
+ * not all values in the given range will have a partition assigned. This
+ * may not technically be true for some data types (e.g. integer types),
+ * however, we currently lack any sort of infrastructure to provide us
+ * with proofs that would allow us to do anything smarter here.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Only the default range partition accepts nulls. */
+ if (!bms_is_empty(keys->keyisnull))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_range_datum_bsearch works. Considering it as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_range_datum_bsearch would've returned the offset of just
+ * one of those. If minkey is inclusive, we must decrement minoff
+ * until it reaches the leftmost of those bound values, so that
+ * partitions corresponding to all those bound values are selected.
+ * If minkeys is exclusive, we must increment minoff until it reaches
+ * the first bound greater than this prefix, so that none of the
+ * partitions corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff++;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff--;
+ else
+ minoff++;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff++;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff--;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff++;
+ else
+ maxoff--;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff++;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff++;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, minoff/maxoff supposedly point to the upper bound of
+ * some partition, but it may not be the case. It might actually be the
+ * upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+
+ if (!partition_bound_has_default(boundinfo))
+ return result;
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ return bms_add_member(result, boundinfo->default_index);
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (bms_num_members(keys->keyisnotnull) < partnatts)
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1c792a00eb..542c4a2bca 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Relids live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..905bd3571c
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1517 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Following entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * efforts only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. The list of clauses is
+ * processed and a PartitionClauseInfo is returned which contains details of
+ * any clauses which could be matched to the partition keys of the relation
+ * defined in the context.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on 'partclauseinfo'. Caller
+ * must have called generate_partition_clauses() in order to have generated
+ * a valid PartitionClauseInfo. Partition pruning proceeds by extracting
+ * constant values from the clauses and comparing it with the partition bounds
+ * while also taking into account strategies of the operators in the matched
+ * clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static PartitionClauseInfo *extract_partition_clauses(
+ PartitionPruneContext *context,
+ List *clauses);
+static bool match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args,
+ List *or_arg_partclauseinfos);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ PartitionClauseInfo *partclauseinfo,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ PartitionClauseInfo *clauseinfo,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ PartitionClauseInfo *partclauseinfo;
+ int partnatts = rel->part_scheme->partnatts,
+ i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses */
+ partclauseinfo = generate_partition_clauses(&context, clauses);
+
+ if (!partclauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes;
+
+ partindexes = get_partitions_from_clauses(&context,
+ partclauseinfo);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Processes 'clauses' and returns a PartitionClauseInfo which contains
+ * the details of any clauses which were matched to partition keys.
+ */
+PartitionClauseInfo *
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* pre-process the clauses and generate the PartitionClauseInfo */
+ return extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in partclauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context,
+ PartitionClauseInfo *partclauseinfo)
+{
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc1,
+ *lc2;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its element in
+ * minimalclauses with the most restrictive set of the clauses from
+ * the corresponding partition key in partclauseinfo.
+ */
+ remove_redundant_clauses(context, partclauseinfo, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, partclauseinfo, minimalclauses,
+ &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ forboth(lc1, partclauseinfo->or_clauses,
+ lc2, partclauseinfo->or_partclauseinfos)
+ {
+ List *or_args = (List *) lfirst(lc1);
+ List *or_arg_partclauseinfos = (List *) lfirst(lc2);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args,
+ or_arg_partclauseinfos);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * Returns a PartitionClauseInfo which stores the clauses which were
+ * matched to the partition key. The PartitionClauseInfo also collects
+ * other useful clauses to assist in partition elimination, such as OR
+ * clauses, clauses containing <> operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of important lists before passing them to this
+ * function.
+ */
+static PartitionClauseInfo *
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ partclauseinfo = palloc(sizeof(PartitionClauseInfo));
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->or_partclauseinfos = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (IsBooleanOpfamily(partopfamily))
+ {
+ Expr *rightop;
+
+ if (match_boolean_partition_clause(clause, partkey, &rightop))
+ {
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ continue;
+ }
+ }
+
+ if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ break;
+ }
+ else
+ /* clause does not match this partition key. */
+ continue;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * break instead of continuing to match the clause with the
+ * next key, because the clause is useless no matter which key
+ * it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ break;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ break;
+
+ /*
+ * Normally we only bother with operators that are listed as
+ * being part of the partitioning operator family. But we
+ * make an exception in one case -- operators named '<>' are
+ * not listed in any operator family whatsoever, in which
+ * case, we try to perform partition pruning with it only if
+ * list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_ne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_ne_listp)
+ break;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ break;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ break;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ break;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ break;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+
+ /*
+ * Now pre-process any OR clauses found above and generate
+ * PartitionClauseInfos for them.
+ */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = lfirst(lc);
+ List *pclauselist = NIL;
+ ListCell *lc2;
+
+ foreach (lc2, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc2));
+ PartitionClauseInfo *orpartclauseinfo;
+
+ orpartclauseinfo = extract_partition_clauses(context, clauses);
+ pclauselist = lappend(pclauselist, orpartclauseinfo);
+ }
+
+ partclauseinfo->or_partclauseinfos =
+ lappend(partclauseinfo->or_partclauseinfos, pclauselist);
+ }
+
+ return partclauseinfo;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_from_or_args
+ *
+ * Returns the set of indexes of partitions, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args,
+ List *or_arg_partclauseinfos)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc1,
+ *lc2;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ forboth(lc1, or_args, lc2, or_arg_partclauseinfos)
+ {
+ List *clauses = list_make1(lfirst(lc1));
+ PartitionClauseInfo *or_arg_partclauseinfo = lfirst(lc2);
+ Bitmapset *arg_partset;
+
+ if (!or_arg_partclauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!or_arg_partclauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(context,
+ or_arg_partclauseinfo);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Process 'partpruneinfo' to remove the clauses that are superseeded
+ * by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets
+ * constfalse in 'partclauseinfo' to inform the caller that we found such
+ * clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ PartitionClauseInfo *partclauseinfo,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and pc is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and pc is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and pc is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* pc is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we discard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid partopcintype, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(partopcintype, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(partopcintype, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process 'clauseinfo' and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to
+ * determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ PartitionClauseInfo *clauseinfo,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clauses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Collect datums from <> operator clauses in its dedicated array. */
+ if (clauseinfo->ne_clauses)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ keys->ne_datums = (Datum *)
+ palloc(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->partopcintype[0],
+ pc->value, &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid partopcintype, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != partopcintype)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ partopcintype, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index fdcbc2f513..6eba13c244 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..4e9281d3d5 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,91 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Datum arrays eqkeys, minkeys, and maxkeys are indexed by
+ * partition key number, whereas ne_datums is not. Bitmapsets keyisnull and
+ * keyisnotnull have a bit for each partition key.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses corresponding to the datums stored in
+ * minkeys and maxkeys, respectively, are inclusive of the stored value or
+ * not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Datum values from clauses containing <> operator. Note that, unlike
+ * the arrays above, the following array is not indexed by partition
+ * key. We only ever use this array for list partitioning and there
+ * can only be one partition key in that case anyway.
+ */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +158,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index c974788a22..a914aa9ea6 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -548,6 +548,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -676,6 +678,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..b654691e9b
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,71 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+/*
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on an IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set. A bit
+ * in 'keyisnotnull' may also be set when a strict OpExpr is encountered for
+ * the given partition key.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each member is a List itself of a given OR clause's arguments. */
+ List *or_clauses;
+
+ /*
+ * Each member is a List itself of PartitionClauseInfos for the arguments
+ * of a given OR clause. Both this and or_clauses should be iterated
+ * together using forboth() macro.
+ */
+ List *or_partclauseinfos;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern PartitionClauseInfo *generate_partition_clauses(
+ PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context,
+ PartitionClauseInfo *partclauseinfo);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..948cad4c3d 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,411 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(5 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(3 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..08fc2dbc21 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,104 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
--
2.11.0
v34-0003-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v34-0003-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 17ca98619fc94df2536d06eec5dc3be5da9609cd Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v34 3/5] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 107 insertions(+), 220 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 266a3ef8ef..169c697c08 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2260,21 +2260,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5040,9 +5025,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index bbffc87842..2021c085d5 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3185,9 +3175,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 011d2a3fa9..fe309a6b54 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4072,9 +4063,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 542c4a2bca..08570ce25d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e8f6cc559b..1671f450b0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -557,7 +557,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -572,6 +571,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1114,12 +1114,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1191,10 +1191,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1425,6 +1427,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1525,6 +1531,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1532,7 +1553,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5939,65 +5960,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..c097da6425 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a914aa9ea6..34d79f284b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -552,6 +553,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -684,6 +688,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2136,27 +2141,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
On Mon, Feb 26, 2018 at 10:59 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
You may say that partition bounds might have to be different too in this
case and hence partition-wise join won't occur anyway, but I'm wondering
if the mismatch of partcollation itself isn't enough to conclude that?
Yeah, you're right. I think that this is just a bug in partition-wise
join, and that the partition scheme should just be using partcollation
instead of parttypcoll, as in the attached.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
partition-scheme-collation.patchapplication/octet-stream; name=partition-scheme-collation.patchDownload
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..1f406ca9b2 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1891,7 +1891,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
@@ -1926,8 +1926,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->partopcintype, partkey->partopcintype,
sizeof(Oid) * partnatts);
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->parttypcoll,
sizeof(Oid) * partnatts);
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index db8de2dfd0..d576aa7350 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -351,7 +351,7 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collations */
/* Cached information about partition key data types. */
int16 *parttyplen;
On 2018/02/28 1:05, Robert Haas wrote:
On Mon, Feb 26, 2018 at 10:59 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:You may say that partition bounds might have to be different too in this
case and hence partition-wise join won't occur anyway, but I'm wondering
if the mismatch of partcollation itself isn't enough to conclude that?Yeah, you're right. I think that this is just a bug in partition-wise
join, and that the partition scheme should just be using partcollation
instead of parttypcoll, as in the attached.
Ah, OK. I was missing that there is no need to have both parttypcoll and
partcollation in PartitionSchemeData, as the Vars in rel->partexprs are
built from a bare PartitionKey (not PartitionSchemeData), and after that
point, parttypcoll no longer needs to kept around.
I noticed that there is a typo in the patch.
+ memcpy(part_scheme->partcollation, partkey->parttypcoll,
s/parttypcoll/partcollation/g
BTW, should there be a relevant test in partition_join.sql? If yes,
attached a patch (partitionwise-join-collation-test-1.patch) to add one.
Also attached updated version of your patch (fixed the typo).
Thanks,
Amit
Attachments:
partition-scheme-collation-2.patchtext/plain; charset=UTF-8; name=partition-scheme-collation-2.patchDownload
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 60f21711f4..b799e249db 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1891,7 +1891,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
sizeof(Oid) * partnatts) != 0 ||
- memcmp(partkey->parttypcoll, part_scheme->parttypcoll,
+ memcmp(partkey->partcollation, part_scheme->partcollation,
sizeof(Oid) * partnatts) != 0)
continue;
@@ -1926,8 +1926,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->partopcintype, partkey->partopcintype,
sizeof(Oid) * partnatts);
- part_scheme->parttypcoll = (Oid *) palloc(sizeof(Oid) * partnatts);
- memcpy(part_scheme->parttypcoll, partkey->parttypcoll,
+ part_scheme->partcollation = (Oid *) palloc(sizeof(Oid) * partnatts);
+ memcpy(part_scheme->partcollation, partkey->partcollation,
sizeof(Oid) * partnatts);
part_scheme->parttyplen = (int16 *) palloc(sizeof(int16) * partnatts);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index db8de2dfd0..d576aa7350 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -351,7 +351,7 @@ typedef struct PartitionSchemeData
int16 partnatts; /* number of partition attributes */
Oid *partopfamily; /* OIDs of operator families */
Oid *partopcintype; /* OIDs of opclass declared input data types */
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+ Oid *partcollation; /* OIDs of partitioning collations */
/* Cached information about partition key data types. */
int16 *parttyplen;
partitionwise-join-collation-test-1.patchtext/plain; charset=UTF-8; name=partitionwise-join-collation-test-1.patchDownload
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 4fccd9ae54..6323a13777 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1869,3 +1869,31 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_n t1 FULL JOIN prt1 t2 ON (t1.c = t2.c);
-> Seq Scan on prt1_n_p2 t1_1
(10 rows)
+--
+-- No partition-wise join if partitioning collation doesn't match
+--
+CREATE TABLE posix_text (a text) PARTITION BY RANGE (a COLLATE "POSIX");
+CREATE TABLE posix_text1 PARTITION OF posix_text FOR VALUES FROM ('a') TO ('m');
+CREATE TABLE posix_text2 PARTITION OF posix_text FOR VALUES FROM ('m') TO ('z ');
+CREATE TABLE c_text (a text) PARTITION BY RANGE (a COLLATE "C");
+CREATE TABLE c_text1 PARTITION OF c_text FOR VALUES FROM ('a') TO ('m');
+CREATE TABLE c_text2 PARTITION OF c_text FOR VALUES FROM ('m') TO ('z ');
+EXPLAIN (COSTS OFF)
+SELECT * FROM posix_text p JOIN c_text c ON (p.a = c.a);
+ QUERY PLAN
+-----------------------------------------------
+ Merge Join
+ Merge Cond: (p.a = c.a)
+ -> Sort
+ Sort Key: p.a
+ -> Append
+ -> Seq Scan on posix_text1 p
+ -> Seq Scan on posix_text2 p_1
+ -> Sort
+ Sort Key: c.a
+ -> Append
+ -> Seq Scan on c_text1 c
+ -> Seq Scan on c_text2 c_1
+(12 rows)
+
+DROP TABLE posix_text, c_text;
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index a2d8b1be55..0df7df487d 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -384,3 +384,19 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_n t1 JOIN prt2_n t2 ON (t1.c = t2.c) JOI
-- partitioned table
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_n t1 FULL JOIN prt1 t2 ON (t1.c = t2.c);
+
+--
+-- No partition-wise join if partitioning collation doesn't match
+--
+CREATE TABLE posix_text (a text) PARTITION BY RANGE (a COLLATE "POSIX");
+CREATE TABLE posix_text1 PARTITION OF posix_text FOR VALUES FROM ('a') TO ('m');
+CREATE TABLE posix_text2 PARTITION OF posix_text FOR VALUES FROM ('m') TO ('z ');
+
+CREATE TABLE c_text (a text) PARTITION BY RANGE (a COLLATE "C");
+CREATE TABLE c_text1 PARTITION OF c_text FOR VALUES FROM ('a') TO ('m');
+CREATE TABLE c_text2 PARTITION OF c_text FOR VALUES FROM ('m') TO ('z ');
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM posix_text p JOIN c_text c ON (p.a = c.a);
+
+DROP TABLE posix_text, c_text;
On Wed, Feb 28, 2018 at 6:42 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/02/28 1:05, Robert Haas wrote:
On Mon, Feb 26, 2018 at 10:59 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:You may say that partition bounds might have to be different too in this
case and hence partition-wise join won't occur anyway, but I'm wondering
if the mismatch of partcollation itself isn't enough to conclude that?Yeah, you're right. I think that this is just a bug in partition-wise
join, and that the partition scheme should just be using partcollation
instead of parttypcoll, as in the attached.Ah, OK. I was missing that there is no need to have both parttypcoll and
partcollation in PartitionSchemeData, as the Vars in rel->partexprs are
built from a bare PartitionKey (not PartitionSchemeData), and after that
point, parttypcoll no longer needs to kept around.
Yes. That's right.
I noticed that there is a typo in the patch.
+ memcpy(part_scheme->partcollation, partkey->parttypcoll,
s/parttypcoll/partcollation/g
BTW, should there be a relevant test in partition_join.sql? If yes,
attached a patch (partitionwise-join-collation-test-1.patch) to add one.
A partition-wise join path will be created but discarded because of
higher cost. This test won't see it in that case. So, please add some
data like other tests and add command to analyze the partitioned
tables. That kind of protects from something like that.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Tue, Feb 27, 2018 at 3:03 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached an updated version in which I incorporated some of the revisions
that David Rowley suggested to OR clauses handling (in partprune.c) that
he posted as a separate patch on the run-time pruning thread [1].
Some comments on 0001.
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /*
+ * Match the partition key types and partitioning-specific collations.
+ */
We are comparing opfamily and opclass input type as well, but this comment
doesn't explicitly mention those like it mentions collation. Instead, I think
we should just say, "Match partition key type properties"
You had commented on "advanced partition matching code" about asserting that
parsupfuncs also match. Robert too has expressed similar opinion upthread. May
be we should at least try to assert that the function OIDs match.
- Oid *parttypcoll; /* OIDs of collations of partition keys. */
+
+ /*
+ * We store both the collation implied by the partition key's type and the
+ * one specified for partitioning. Values in the former are used as
+ * varcollid in the Vars corresponding to simple column partition keys so
+ * as to make them match corresponding Vars appearing elsewhere in the
+ * query tree. Whereas, the latter is used when actually comparing values
+ * against partition bounds datums, such as, when doing partition pruning.
+ */
+ Oid *parttypcoll;
+ Oid *partcollation;
As you have already mentioned upthread only partcollation is needed, not
parttypcoll.
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /*
+ * Cached array of partitioning comparison functions' fmgr structs. We
+ * don't compare these when trying to match two partition schemes.
+ */
I think this comment should go away. The second sentence doesn't explain why
and if it did so it should do that in find_partition_scheme() not here.
partsupfunc is another property of partition keys that is cached like
parttyplen, parttypbyval. Why does it deserve a separate comment when others
don't?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 27 February 2018 at 22:33, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached an updated version in which I incorporated some of the revisions
that David Rowley suggested to OR clauses handling (in partprune.c) that
he posted as a separate patch on the run-time pruning thread [1].
Thanks for fixing that up and including it.
Micro review of v34:
1. Looks like you've renamed the parttypid parameter in the definition
of partkey_datum_from_expr and partition_cmp_args, but not updated the
declaration too.
+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum *value);
+static bool
+partkey_datum_from_expr(Oid partopcintype, Expr *expr, Datum *value)
+static bool partition_cmp_args(Oid parttypid, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool
+partition_cmp_args(Oid partopcintype, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
2. In prune_append_rel_partitions(), it's not exactly illegal, but int
i is declared twice in different scopes. Looks like there's no need
for the inner one.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Feb 27, 2018 at 8:12 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Ah, OK. I was missing that there is no need to have both parttypcoll and
partcollation in PartitionSchemeData, as the Vars in rel->partexprs are
built from a bare PartitionKey (not PartitionSchemeData), and after that
point, parttypcoll no longer needs to kept around.I noticed that there is a typo in the patch.
+ memcpy(part_scheme->partcollation, partkey->parttypcoll,
s/parttypcoll/partcollation/g
Committed your version.
BTW, should there be a relevant test in partition_join.sql? If yes,
attached a patch (partitionwise-join-collation-test-1.patch) to add one.
I don't feel strongly about it, but I'm not going to try to prevent
you from adding one, either.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018/02/28 19:14, Ashutosh Bapat wrote:
On Wed, Feb 28, 2018 at 6:42 AM, Amit Langote wrote:
BTW, should there be a relevant test in partition_join.sql? If yes,
attached a patch (partitionwise-join-collation-test-1.patch) to add one.A partition-wise join path will be created but discarded because of
higher cost. This test won't see it in that case. So, please add some
data like other tests and add command to analyze the partitioned
tables. That kind of protects from something like that.
Thanks for the review.
Hmm, the added test is such that the partition collations won't match, so
partition-wise join won't be considered at all due to differing
PartitionSchemes, unless I'm missing something.
Thanks,
Amit
On 2018/03/01 2:23, Robert Haas wrote:
On Tue, Feb 27, 2018 at 8:12 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Ah, OK. I was missing that there is no need to have both parttypcoll and
partcollation in PartitionSchemeData, as the Vars in rel->partexprs are
built from a bare PartitionKey (not PartitionSchemeData), and after that
point, parttypcoll no longer needs to kept around.I noticed that there is a typo in the patch.
+ memcpy(part_scheme->partcollation, partkey->parttypcoll,
s/parttypcoll/partcollation/g
Committed your version.
Thank you.
BTW, should there be a relevant test in partition_join.sql? If yes,
attached a patch (partitionwise-join-collation-test-1.patch) to add one.I don't feel strongly about it, but I'm not going to try to prevent
you from adding one, either.
OK. Attached is a revised version of that patch in case you consider
committing it, addressing Ashutosh's comment that the tables used in the
test should contain some data.
Thanks,
Amit
Attachments:
partitionwise-join-collation-test-2.patchtext/plain; charset=UTF-8; name=partitionwise-join-collation-test-2.patchDownload
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 4fccd9ae54..f076d15ced 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1869,3 +1869,35 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_n t1 FULL JOIN prt1 t2 ON (t1.c = t2.c);
-> Seq Scan on prt1_n_p2 t1_1
(10 rows)
+--
+-- No partition-wise join if partitioning collation doesn't match
+--
+CREATE TABLE posix_text (a text) PARTITION BY RANGE (a COLLATE "POSIX");
+CREATE TABLE posix_text1 PARTITION OF posix_text FOR VALUES FROM ('a') TO ('m');
+CREATE TABLE posix_text2 PARTITION OF posix_text FOR VALUES FROM ('m') TO ('z ');
+INSERT INTO posix_text SELECT chr(97 + (i-1)%26::int) FROM generate_series(1, 599) i;
+ANALYZE posix_text;
+CREATE TABLE c_text (a text) PARTITION BY RANGE (a COLLATE "C");
+CREATE TABLE c_text1 PARTITION OF c_text FOR VALUES FROM ('a') TO ('m');
+CREATE TABLE c_text2 PARTITION OF c_text FOR VALUES FROM ('m') TO ('z ');
+INSERT INTO c_text SELECT chr(97 + (i-1)%26::int) FROM generate_series(1, 599) i;
+ANALYZE c_text;
+EXPLAIN (COSTS OFF)
+SELECT p.a, c.a FROM posix_text p JOIN c_text c ON (p.a = c.a) ORDER BY 1;
+ QUERY PLAN
+-----------------------------------------------
+ Merge Join
+ Merge Cond: (p.a = c.a)
+ -> Sort
+ Sort Key: p.a
+ -> Append
+ -> Seq Scan on posix_text1 p
+ -> Seq Scan on posix_text2 p_1
+ -> Sort
+ Sort Key: c.a
+ -> Append
+ -> Seq Scan on c_text1 c
+ -> Seq Scan on c_text2 c_1
+(12 rows)
+
+DROP TABLE posix_text, c_text;
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index a2d8b1be55..9dd15c1059 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -384,3 +384,23 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_n t1 JOIN prt2_n t2 ON (t1.c = t2.c) JOI
-- partitioned table
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_n t1 FULL JOIN prt1 t2 ON (t1.c = t2.c);
+
+--
+-- No partition-wise join if partitioning collation doesn't match
+--
+CREATE TABLE posix_text (a text) PARTITION BY RANGE (a COLLATE "POSIX");
+CREATE TABLE posix_text1 PARTITION OF posix_text FOR VALUES FROM ('a') TO ('m');
+CREATE TABLE posix_text2 PARTITION OF posix_text FOR VALUES FROM ('m') TO ('z ');
+INSERT INTO posix_text SELECT chr(97 + (i-1)%26::int) FROM generate_series(1, 599) i;
+ANALYZE posix_text;
+
+CREATE TABLE c_text (a text) PARTITION BY RANGE (a COLLATE "C");
+CREATE TABLE c_text1 PARTITION OF c_text FOR VALUES FROM ('a') TO ('m');
+CREATE TABLE c_text2 PARTITION OF c_text FOR VALUES FROM ('m') TO ('z ');
+INSERT INTO c_text SELECT chr(97 + (i-1)%26::int) FROM generate_series(1, 599) i;
+ANALYZE c_text;
+
+EXPLAIN (COSTS OFF)
+SELECT p.a, c.a FROM posix_text p JOIN c_text c ON (p.a = c.a) ORDER BY 1;
+
+DROP TABLE posix_text, c_text;
Thanks Ashutosh and David for reviewing. Replying to both.
On 2018/02/28 20:25, Ashutosh Bapat wrote:
On Tue, Feb 27, 2018 at 3:03 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Attached an updated version in which I incorporated some of the revisions
that David Rowley suggested to OR clauses handling (in partprune.c) that
he posted as a separate patch on the run-time pruning thread [1].Some comments on 0001.
partnatts != part_scheme->partnatts)
continue;- /* Match the partition key types. */ + /* + * Match the partition key types and partitioning-specific collations. + */We are comparing opfamily and opclass input type as well, but this comment
doesn't explicitly mention those like it mentions collation. Instead, I think
we should just say, "Match partition key type properties"
Sounds good, done.
You had commented on "advanced partition matching code" about asserting that
parsupfuncs also match. Robert too has expressed similar opinion upthread. May
be we should at least try to assert that the function OIDs match.
I guess you're referring to this email of mine:
/messages/by-id/e681c283-5fc6-b1c6-1bb9-7102c37e2d55@lab.ntt.co.jp
Note that I didn't say that we should Assert the equality of partsupfunc
members themselves, but rather whether we could add a comment explaining
why we don't. Although, like you say, we could Assert the equality of
function OIDs, so added a block like this:
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ {
+ int i;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+ }
+#endif
- Oid *parttypcoll; /* OIDs of collations of partition keys. */ + + /* + * We store both the collation implied by the partition key's type and the + * one specified for partitioning. Values in the former are used as + * varcollid in the Vars corresponding to simple column partition keys so + * as to make them match corresponding Vars appearing elsewhere in the + * query tree. Whereas, the latter is used when actually comparing values + * against partition bounds datums, such as, when doing partition pruning. + */ + Oid *parttypcoll; + Oid *partcollation;As you have already mentioned upthread only partcollation is needed, not
parttypcoll.
This hunk is gone after rebasing over 2af28e603319 (For partitionwise
join, match on partcollation, not parttypcoll) that was committed earlier
today.
/* Cached information about partition key data types. */ int16 *parttyplen; bool *parttypbyval; + + /* + * Cached array of partitioning comparison functions' fmgr structs. We + * don't compare these when trying to match two partition schemes. + */I think this comment should go away. The second sentence doesn't explain why
and if it did so it should do that in find_partition_scheme() not here.
partsupfunc is another property of partition keys that is cached like
parttyplen, parttypbyval. Why does it deserve a separate comment when others
don't?
Replaced that comment with:
+ /* Cached information about partition comparison functions. */
As mentioned above, I already added a comment in find_partition_scheme().
On 2018/02/28 20:35, David Rowley wrote:
Micro review of v34:
1. Looks like you've renamed the parttypid parameter in the definition
of partkey_datum_from_expr and partition_cmp_args, but not updated the
declaration too.+static bool partkey_datum_from_expr(Oid parttypid, Expr *expr, Datum
*value);
+static bool +partkey_datum_from_expr(Oid partopcintype, Expr *expr, Datum *value)+static bool partition_cmp_args(Oid parttypid, Oid partopfamily, + PartClause *pc, PartClause *leftarg, PartClause *rightarg, + bool *result);+static bool +partition_cmp_args(Oid partopcintype, Oid partopfamily, + PartClause *pc, PartClause *leftarg, PartClause *rightarg, + bool *result)
Oops, forgot about the declarations. Fixed.
2. In prune_append_rel_partitions(), it's not exactly illegal, but int
i is declared twice in different scopes. Looks like there's no need
for the inner one.
Removed the inner one.
Attached updated patches.
Thanks,
Amit
Attachments:
v35-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v35-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From abd78b8a1ce128a57d39d6bc193c6b5bfb3eb022 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v35 1/3] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 +++++++++++++++++++++++-
src/include/nodes/relation.h | 4 ++++
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b799e249db..ac85a79023 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1886,7 +1886,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1904,6 +1904,23 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ {
+ int i;
+
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+ }
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1938,6 +1955,11 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d576aa7350..08a177dac4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v35-0002-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v35-0002-Faster-partition-pruning.patchDownload
From b9eb3b453d2134356399b285751d7cf31fb90c67 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v35 2/3] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com)
Dilip Kumar (dilipbalaut@gmail.com),
---
src/backend/catalog/partition.c | 669 +++++++++++
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/clauses.c | 4 +-
src/backend/optimizer/util/partprune.c | 1517 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 89 ++
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/clauses.h | 2 +
src/include/optimizer/partprune.h | 71 ++
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 486 +++++++-
src/test/regress/sql/partition_prune.sql | 102 +-
15 files changed, 2950 insertions(+), 75 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index fcf7655553..021170a654 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,15 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1569,669 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ Bitmapset *result;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result = get_partitions_for_keys_hash(context, keys);
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result = get_partitions_for_keys_list(context, keys);
+
+ /* Some partitions might have to be removed from result */
+ if (keys->n_ne_datums > 0)
+ {
+ Bitmapset *ne_parts;
+
+ /*
+ * Remove the indexes of any partitions which cannot possibly
+ * contain rows matching the clauses due to key->ne_datums
+ * containing all datum values which are allowed in the given
+ * partition. This is only possible to do in LIST partitioning
+ * as it's the only partitioning strategy which allows the
+ * specification of exact values.
+ */
+ ne_parts = get_partitions_excluded_by_ne_datums(context,
+ keys->ne_datums,
+ keys->n_ne_datums);
+ result = bms_del_members(result, ne_parts);
+ bms_free(ne_parts);
+ }
+
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result = get_partitions_for_keys_range(context, keys);
+ break;
+
+ default:
+ result = NULL;
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * get_partitions_for_keys_hash
+ * Return partitions of a hash partitioned table for requested
+ * keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the hash partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ int partnatts = context->partnatts,
+ nparts = context->nparts,
+ i;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ /*
+ * Since tuples with NULL values in the partition key columns are stored
+ * in regular partitions, we'll treat any IS NULL clauses here as regular
+ * equality clauses.
+ */
+ memset(keyisnull, false, sizeof(keyisnull));
+ i = -1;
+ while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
+ {
+ keys->n_eqkeys++;
+ Assert(i < partnatts);
+ keyisnull[i] = true;
+ }
+
+ /*
+ * Can only do pruning if we know all the keys and they're all equality
+ * keys including the nulls that we just counted above.
+ */
+ if (keys->n_eqkeys == partnatts)
+ {
+ uint64 rowHash;
+ int greatest_modulus = get_greatest_modulus(boundinfo),
+ result_index;
+
+ rowHash = compute_hash_value(partnatts, partsupfunc,
+ keys->eqkeys, keyisnull);
+ result_index = boundinfo->indexes[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, nparts - 1);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Return partitions of a list partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the list partitioning semantics.
+ *
+ * Note: LIST partitioning only supports a single partition key, therefore
+ * this function requires no looping over the partition keys.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Handle clauses requesting a NULL valued partition key */
+ if (!bms_is_empty(keys->keyisnull))
+ {
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /* Equality key. */
+ if (keys->n_eqkeys > 0)
+ {
+ eqoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->eqkeys[0],
+ &is_equal);
+ if (eqoff >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(boundinfo->indexes[eqoff] >= 0);
+ return bms_make_singleton(boundinfo->indexes[eqoff]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the left-most bound that satisfies the query, i.e., the one that
+ * satisfies minkeys.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->minkeys[0],
+ &is_equal);
+ if (minoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * minkeys[0] must be greater than or equal to the smallest datum.
+ * If we didn't find an exact matching datum (!is_equal) or if the
+ * operator used was non-inclusive (>), then in both of these
+ * cases we're not interested in the datum pointed to by minoff,
+ * but we may start getting matches in the partition which the
+ * next datum belongs to, so point to that one instead. (This may
+ * be beyond the last datum in the array, but we'll detect that
+ * later.)
+ */
+ if (!is_equal || !keys->min_incl)
+ minoff++;
+ }
+ else
+ {
+ /*
+ * minoff set to -1 means all datums are greater than minkeys[0],
+ * which means all partitions satisfy minkeys. In that case, set
+ * minoff to the index of the leftmost datum, viz. 0.
+ */
+ minoff = 0;
+ }
+
+ /*
+ * The value of minkeys[0] is greater than all of the datums we have
+ * partitions for. The only possible partition that could contain
+ * a match is the default partition. Return that, if it exists.
+ */
+ if (minoff > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ /*
+ * Find the right-most bound that satisfies the query, i.e., the one that
+ * satisfies maxkeys.
+ */
+ maxoff = boundinfo->ndatums - 1;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo, keys->maxkeys[0],
+ &is_equal);
+ if (maxoff >= 0)
+ {
+ /*
+ * partition_list_bsearch returning a positive number means that
+ * maxkeys[0] must be greater than or equal to the smallest datum.
+ * If the match found is an equal match, but the operator used is
+ * non-inclusive of that value (<), then the partition belonging
+ * to maxoff cannot match, so we'll decrement maxoff to point to
+ * the partition belonging to the previous datum. We might end up
+ * decrementing maxoff down to -1, but we'll handle that later.
+ */
+ if (is_equal && !keys->max_incl)
+ maxoff--;
+ }
+
+ /*
+ * maxkeys is smaller than the datums of all non-default partitions,
+ * meaning there isn't one to return. Return the default partition if
+ * one exists.
+ */
+ if (maxoff < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * All datums between those at minoff and maxoff satisfy query's keys, so
+ * add the corresponding partitions to the result set.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, boundinfo->indexes[i]);
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous manner,
+ * not all values in the given range will have a partition assigned. This
+ * may not technically be true for some data types (e.g. integer types),
+ * however, we currently lack any sort of infrastructure to provide us
+ * with proofs that would allow us to do anything smarter here.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Return partitions of a range partitioned table for requested keys
+ *
+ * This interprets the keys and looks up partitions in the partition bound
+ * descriptor using the range partitioning semantics.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ PartScanKeyInfo *keys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *result = NULL;
+ int partnatts = context->partnatts,
+ i,
+ eqoff,
+ minoff,
+ maxoff;
+ bool is_equal;
+
+ /* Only the default range partition accepts nulls. */
+ if (!bms_is_empty(keys->keyisnull))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition, if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /* Equality keys. */
+ if (keys->n_eqkeys > 0)
+ {
+ /* Valid iff there are as many as partition key columns. */
+ Assert(keys->n_eqkeys == partnatts);
+ eqoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_eqkeys, keys->eqkeys,
+ &is_equal);
+ /*
+ * The bound at eqoff is known to be <= eqkeys, given the way
+ * partition_range_datum_bsearch works. Considering it as the lower
+ * bound of the partition that eqkeys falls into, the bound at
+ * eqoff + 1 would be its upper bound, so use eqoff + 1 to get the
+ * desired partition's index.
+ */
+ if (eqoff >= 0 && boundinfo->indexes[eqoff + 1] >= 0)
+ return bms_make_singleton(boundinfo->indexes[eqoff+1]);
+ /*
+ * eqkeys falls into a range of values for which no non-default
+ * partition exists.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * Find the leftmost bound that satisfies the query, that is, make minoff
+ * point to the datum corresponding to the upper bound of the left-most
+ * partition to be selected.
+ */
+ minoff = 0;
+ if (keys->n_minkeys > 0)
+ {
+ minoff = partition_range_datum_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ keys->n_minkeys, keys->minkeys,
+ &is_equal);
+
+ /*
+ * If minkeys does not contain values for all partition key columns,
+ * that is, only a prefix is specified, then there may be multiple
+ * bounds in boundinfo that share the same prefix. But
+ * partition_range_datum_bsearch would've returned the offset of just
+ * one of those. If minkey is inclusive, we must decrement minoff
+ * until it reaches the leftmost of those bound values, so that
+ * partitions corresponding to all those bound values are selected.
+ * If minkeys is exclusive, we must increment minoff until it reaches
+ * the first bound greater than this prefix, so that none of the
+ * partitions corresponding to those bound values are selected.
+ */
+ if (is_equal && keys->n_minkeys < partnatts)
+ {
+ while (minoff >= 1 && minoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->min_incl ? minoff - 1 : minoff + 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->minkeys,
+ keys->n_minkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->min_incl)
+ minoff++;
+ break;
+ }
+
+ if (keys->min_incl)
+ minoff--;
+ else
+ minoff++;
+ }
+ }
+ /*
+ * Assuming minoff currently points to the lower bound of the left-
+ * most selected partition, increment it so that it points to the
+ * upper bound.
+ */
+ else
+ minoff++;
+ }
+
+ /*
+ * Find the rightmost bound that satisfies the query, that is, make maxoff
+ * maxoff point to the datum corresponding to the upper bound of the
+ * right-most partition to be selected.
+ */
+ maxoff = boundinfo->ndatums;
+ if (keys->n_maxkeys > 0)
+ {
+ maxoff = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ keys->n_maxkeys, keys->maxkeys,
+ &is_equal);
+
+ /* See the comment written above for minkeys. */
+ if (is_equal && keys->n_maxkeys < partnatts)
+ {
+ while (maxoff >= 1 && maxoff < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = keys->max_incl ? maxoff + 1 : maxoff - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ keys->maxkeys,
+ keys->n_maxkeys);
+ if (cmpval != 0)
+ {
+ /* Move to the non-equal bound only in this case. */
+ if (!keys->max_incl)
+ maxoff--;
+ break;
+ }
+
+ if (keys->max_incl)
+ maxoff++;
+ else
+ maxoff--;
+ }
+
+ /*
+ * Assuming maxoff currently points to the lower bound of the
+ * right-most partition, increment it so that it points to the
+ * upper bound.
+ */
+ maxoff++;
+ }
+ /*
+ * Assuming maxoff currently points to the lower bound of the right-
+ * most selected partition, increment it so that it points to the
+ * upper bound. We do not need to include that partition though if
+ * maxkeys exactly matched the bound in question and it is exclusive.
+ * Not incrementing simply means we treat the matched bound itself
+ * the upper bound of the right-most selected partition.
+ */
+ else if (!is_equal || keys->max_incl)
+ maxoff++;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+
+ /*
+ * At this point, minoff/maxoff supposedly point to the upper bound of
+ * some partition, but it may not be the case. It might actually be the
+ * upper bound of an unassigned range of values, which if so, move
+ * minoff/maxoff to the adjacent bound which must be the upper bound of
+ * a valid partition.
+ *
+ * By doing that, we skip over a portion of values that do indeed satisfy
+ * the query, but don't have a valid partition assigned. The default
+ * partition will have to be included to cover those values. Although, if
+ * the original bound in question contains an infinite value, there would
+ * not be any unassigned range to speak of, because the range is unbounded
+ * in that direction by definition, so no need to include the default.
+ */
+ if (boundinfo->indexes[minoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_minkeys > 0)
+ lastkey = keys->n_minkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && boundinfo->indexes[maxoff] < 0)
+ {
+ int lastkey;
+
+ if (keys->n_maxkeys > 0)
+ lastkey = keys->n_maxkeys - 1;
+ else
+ lastkey = partnatts - 1;
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] == PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, boundinfo->default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ boundinfo->indexes[minoff],
+ boundinfo->indexes[maxoff]);
+
+ if (!partition_bound_has_default(boundinfo))
+ return result;
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (boundinfo->indexes[i] < 0)
+ return bms_add_member(result, boundinfo->default_index);
+ }
+
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys
+ * could be null.
+ */
+ if (bms_num_members(keys->keyisnotnull) < partnatts)
+ result = bms_add_member(result, boundinfo->default_index);
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1c792a00eb..542c4a2bca 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Relids live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 89f27ce0eb..0c1f23951a 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
static Node *substitute_actual_parameters_mutator(Node *node,
substitute_actual_parameters_context *context);
static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
- Oid result_collation);
static Query *substitute_actual_srf_parameters(Query *expr,
int nargs, List *args);
static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4833,7 +4831,7 @@ sql_inline_error_callback(void *arg)
* We use the executor's routine ExecEvalExpr() to avoid duplication of
* code and ensure we get the same result as the executor would get.
*/
-static Expr *
+Expr *
evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
Oid result_collation)
{
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..70a215ff3f
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1517 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Following entry points exist to this module.
+ *
+ * prune_append_rel_partitions()
+ *
+ * This is to be called for a partitioned table to prune away the partitions
+ * that provably won't be scanned by a given query based on the table's
+ * rel->baserestrictinfo. It should be called before starting to look at the
+ * individual partitions to set their access paths, so that we expend planning
+ * efforts only on the partitions that are relevant to the query. Pruning by
+ * this function only occurs if rel->baserestrictinfo contains at least one
+ * clause whose variable argument matches a proper prefix of the table's
+ * partition key and the other argument is a Const node.
+ *
+ * generate_partition_clauses()
+ *
+ * This is to be called to extract clauses that will be useful for partition
+ * pruning from a list of clauses containing clauses that reference a given
+ * partitioned table. For example, prune_append_rel_partitions() calls this
+ * function, because a partitioned table's rel->baserestrictinfo may contain
+ * clauses that might be useful for partitioning. The list of clauses is
+ * processed and a PartitionClauseInfo is returned which contains details of
+ * any clauses which could be matched to the partition keys of the relation
+ * defined in the context.
+ *
+ * get_partitions_from_clauses()
+ *
+ * This is to be called to prune partitions based on 'partclauseinfo'. Caller
+ * must have called generate_partition_clauses() in order to have generated
+ * a valid PartitionClauseInfo. Partition pruning proceeds by extracting
+ * constant values from the clauses and comparing it with the partition bounds
+ * while also taking into account strategies of the operators in the matched
+ * clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key column kept to
+ * avoid recomputing it in remove_redundant_clauses().
+ */
+typedef struct PartClause
+{
+ Oid opno; /* opno to compare partkey to 'value' */
+ Oid inputcollid; /* collation to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ bool valid_cache; /* Are the following fields populated? */
+ int op_strategy;
+ Oid op_subtype;
+ FmgrInfo op_func;
+} PartClause;
+
+/*
+ * Strategy of a partition clause operator per the partitioning operator class
+ * definition.
+ */
+typedef enum PartOpStrategy
+{
+ PART_OP_EQUAL,
+ PART_OP_LESS,
+ PART_OP_GREATER
+} PartOpStrategy;
+
+static PartitionClauseInfo *extract_partition_clauses(
+ PartitionPruneContext *context,
+ List *clauses);
+static bool match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop);
+static Bitmapset *get_partitions_from_or_args(PartitionPruneContext *context,
+ List *or_args,
+ List *or_arg_partclauseinfos);
+static void remove_redundant_clauses(PartitionPruneContext *context,
+ PartitionClauseInfo *partclauseinfo,
+ List **minimalclauses);
+static bool partition_cmp_args(Oid partopcintype, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result);
+static bool extract_bounding_datums(PartitionPruneContext *context,
+ PartitionClauseInfo *clauseinfo,
+ List **minimalclauses, PartScanKeyInfo *keys);
+static PartOpStrategy partition_op_strategy(char part_strategy,
+ PartClause *pc, bool *incl);
+static bool partkey_datum_from_expr(Oid partopcintype, Expr *expr,
+ Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ PartitionClauseInfo *partclauseinfo;
+ int partnatts = rel->part_scheme->partnatts;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses */
+ partclauseinfo = generate_partition_clauses(&context, clauses);
+
+ if (!partclauseinfo->constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes;
+
+ partindexes = get_partitions_from_clauses(&context,
+ partclauseinfo);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_clauses
+ * Processes 'clauses' and returns a PartitionClauseInfo which contains
+ * the details of any clauses which were matched to partition keys.
+ */
+PartitionClauseInfo *
+generate_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* pre-process the clauses and generate the PartitionClauseInfo */
+ return extract_partition_clauses(context, clauses);
+}
+
+/*
+ * get_partitions_from_clauses
+ * Determine partitions that could possible contain a record that
+ * satisfies clauses as described in partclauseinfo
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_partitions_from_clauses(PartitionPruneContext *context,
+ PartitionClauseInfo *partclauseinfo)
+{
+ PartScanKeyInfo keys;
+ Bitmapset *result;
+ ListCell *lc1,
+ *lc2;
+
+ Assert(partclauseinfo != NULL);
+ Assert(!partclauseinfo->constfalse);
+
+ if (!partclauseinfo->foundkeyclauses)
+ {
+ /* No interesting clauses were found to eliminate partitions. */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ else
+ {
+ List *minimalclauses[PARTITION_MAX_KEYS];
+
+ /*
+ * For each partition key column, populate its element in
+ * minimalclauses with the most restrictive set of the clauses from
+ * the corresponding partition key in partclauseinfo.
+ */
+ remove_redundant_clauses(context, partclauseinfo, minimalclauses);
+
+ /* Did remove_redundant_clauses find any contradicting clauses? */
+ if (partclauseinfo->constfalse)
+ return NULL;
+
+ if (extract_bounding_datums(context, partclauseinfo, minimalclauses,
+ &keys))
+ {
+ result = get_partitions_for_keys(context, &keys);
+
+ /*
+ * No point in trying to look at other conjunctive clauses, if we
+ * got an empty set in the first place.
+ */
+ if (bms_is_empty(result))
+ return NULL;
+ }
+ else
+ {
+ /*
+ * Looks like we didn't have *all* the values we'd need to
+ * prune partitions using get_partitions_for_keys().
+ */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ }
+ }
+
+ /* Now apply the OR clauses. */
+ forboth(lc1, partclauseinfo->or_clauses,
+ lc2, partclauseinfo->or_partclauseinfos)
+ {
+ List *or_args = (List *) lfirst(lc1);
+ List *or_arg_partclauseinfos = (List *) lfirst(lc2);
+ Bitmapset *or_parts;
+
+ or_parts = get_partitions_from_or_args(context, or_args,
+ or_arg_partclauseinfos);
+
+ /*
+ * Clauses in or_clauses are mutually conjunctive and also in
+ * in conjunction with the rest of the clauses above, so combine the
+ * partitions thus selected with those in result using set
+ * intersection.
+ */
+ result = bms_int_members(result, or_parts);
+ bms_free(or_parts);
+ }
+
+ return result;
+}
+
+/* Module-local functions */
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * extract_partition_clauses
+ * Processes 'clauses' to extract clause matching the partition key.
+ * Returns a PartitionClauseInfo which stores the clauses which were
+ * matched to the partition key. The PartitionClauseInfo also collects
+ * other useful clauses to assist in partition elimination, such as OR
+ * clauses, clauses containing <> operator, and IS [NOT] NULL clauses
+ *
+ * We may also discover some contradiction in the clauses which means that no
+ * partition can possibly match. In this case, the function sets
+ * context->clauseinfo's 'constfalse' to true and exits immediately without
+ * processing any further clauses. In this case, the caller must be careful
+ * not to assume the context->clauseinfo is fully populated with all clauses.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of important lists before passing them to this
+ * function.
+ */
+static PartitionClauseInfo *
+extract_partition_clauses(PartitionPruneContext *context, List *clauses)
+{
+ PartitionClauseInfo *partclauseinfo;
+ ListCell *lc;
+
+ partclauseinfo = palloc(sizeof(PartitionClauseInfo));
+ memset(partclauseinfo->keyclauses, 0, sizeof(partclauseinfo->keyclauses));
+ partclauseinfo->or_clauses = NIL;
+ partclauseinfo->or_partclauseinfos = NIL;
+ partclauseinfo->ne_clauses = NIL;
+ partclauseinfo->keyisnull = NULL;
+ partclauseinfo->keyisnotnull = NULL;
+ partclauseinfo->constfalse = false;
+ partclauseinfo->foundkeyclauses = false;
+
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ ((BoolExpr *) clause)->args);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+ /* Fall-through for a NOT clause, which is handled below. */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ PartClause *pc;
+ Oid partopfamily = context->partopfamily[i];
+ Oid partcoll = context->partcollation[i];
+ Oid commutator = InvalidOid;
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (IsBooleanOpfamily(partopfamily))
+ {
+ Expr *rightop;
+
+ if (match_boolean_partition_clause(clause, partkey, &rightop))
+ {
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = BooleanEqualOperator;
+ pc->inputcollid = InvalidOid;
+ pc->value = rightop;
+
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ continue;
+ }
+ }
+
+ if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop,
+ *valueexpr;
+ bool is_ne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ valueexpr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ valueexpr = leftop;
+
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ break;
+ }
+ else
+ /* clause does not match this partition key. */
+ continue;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * break instead of continuing to match the clause with the
+ * next key, because the clause is useless no matter which key
+ * it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ break;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) valueexpr))
+ break;
+
+ /*
+ * Normally we only bother with operators that are listed as
+ * being part of the partitioning operator family. But we
+ * make an exception in one case -- operators named '<>' are
+ * not listed in any operator family whatsoever, in which
+ * case, we try to perform partition pruning with it only if
+ * list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ /*
+ * To confirm if the operator is really '<>', check if its
+ * negator is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false,
+ &strategy,
+ &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_ne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_ne_listp)
+ break;
+ }
+
+ pc = (PartClause *) palloc0(sizeof(PartClause));
+ pc->opno = OidIsValid(commutator) ? commutator : opclause->opno;
+ pc->inputcollid = opclause->inputcollid;
+ pc->value = valueexpr;
+
+ /*
+ * We don't turn a <> operator clause into a key right away.
+ * Instead, the caller will hand over such clauses to
+ * get_partitions_excluded_by_ne_clauses().
+ */
+ if (is_ne_listp)
+ partclauseinfo->ne_clauses =
+ lappend(partclauseinfo->ne_clauses,
+ pc);
+ else
+ partclauseinfo->keyclauses[i] =
+ lappend(partclauseinfo->keyclauses[i],
+ pc);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+ /* Record that a strict clause has been seen for this key */
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ continue;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ break;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ break;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ break;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) &&
+ op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ break;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array
+ * element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ break;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build the OR clause if needed or add the clauses to the end
+ * of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ partclauseinfo->or_clauses =
+ lappend(partclauseinfo->or_clauses,
+ elem_clauses);
+ else
+ clauses = list_concat(clauses, elem_clauses);
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does leftop match with this partition key column? */
+ if (equal(arg, partkey))
+ {
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnotnull))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+ partclauseinfo->keyisnull =
+ bms_add_member(partclauseinfo->keyisnull,
+ i);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(i, partclauseinfo->keyisnull))
+ {
+ partclauseinfo->constfalse = true;
+ return partclauseinfo;
+ }
+
+ partclauseinfo->keyisnotnull =
+ bms_add_member(partclauseinfo->keyisnotnull,
+ i);
+ }
+ }
+ }
+
+ /* Clause was matched. */
+ partclauseinfo->foundkeyclauses = true;
+ }
+ }
+
+ /*
+ * Now pre-process any OR clauses found above and generate
+ * PartitionClauseInfos for them.
+ */
+ foreach(lc, partclauseinfo->or_clauses)
+ {
+ List *or_args = lfirst(lc);
+ List *pclauselist = NIL;
+ ListCell *lc2;
+
+ foreach (lc2, or_args)
+ {
+ List *clauses = list_make1(lfirst(lc2));
+ PartitionClauseInfo *orpartclauseinfo;
+
+ orpartclauseinfo = extract_partition_clauses(context, clauses);
+ pclauselist = lappend(pclauselist, orpartclauseinfo);
+ }
+
+ partclauseinfo->or_partclauseinfos =
+ lappend(partclauseinfo->or_partclauseinfos, pclauselist);
+ }
+
+ return partclauseinfo;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_from_or_args
+ *
+ * Returns the set of indexes of partitions, each of which satisfies some
+ * clause in or_args.
+ */
+static Bitmapset *
+get_partitions_from_or_args(PartitionPruneContext *context, List *or_args,
+ List *or_arg_partclauseinfos)
+{
+ Bitmapset *result = NULL;
+ ListCell *lc1,
+ *lc2;
+
+ /*
+ * When matching an OR expression, it is only checked if at least one of
+ * its args matches the partition key, not all. For arguments that don't
+ * match, we cannot eliminate any of its partitions using
+ * get_partitions_from_clauses(). However, if the table is itself a
+ * partition, we may be able to prove using constraint exclusion that the
+ * clause refutes its partition constraint, that is, we can eliminate all
+ * of its partitions.
+ */
+ forboth(lc1, or_args, lc2, or_arg_partclauseinfos)
+ {
+ List *clauses = list_make1(lfirst(lc1));
+ PartitionClauseInfo *or_arg_partclauseinfo = lfirst(lc2);
+ Bitmapset *arg_partset;
+
+ if (!or_arg_partclauseinfo->foundkeyclauses)
+ {
+ List *partconstr = context->partition_qual;
+
+ if (partconstr)
+ {
+ partconstr = (List *) expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1, context->relid, 0);
+ if (predicate_refuted_by(partconstr, clauses, false))
+ continue;
+ }
+
+ /* Couldn't eliminate any of the partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ if (!or_arg_partclauseinfo->constfalse)
+ arg_partset = get_partitions_from_clauses(context,
+ or_arg_partclauseinfo);
+ else
+ arg_partset = NULL;
+
+ result = bms_add_members(result, arg_partset);
+ bms_free(arg_partset);
+ }
+
+ return result;
+}
+
+/*
+ * remove_redundant_clauses
+ * Process 'partpruneinfo' to remove the clauses that are superseeded
+ * by other clauses which are more restrictive.
+ *
+ * Finished lists of clauses are returned in *minimalclauses which is an array
+ * with one slot for each of the partition keys.
+ *
+ * For example, x > 1 AND x > 2 and x >= 5, the latter is the most
+ * restrictive, so 5 is the best minimum bound for x.
+ *
+ * We also look for clauses which contradict one another in a way that proves
+ * that the clauses cannot possibly match any partition. Impossible clauses
+ * include things like: x = 1 AND x = 2, x > 0 and x < 10. The function
+ * returns right after finding such a clause and before returning, sets
+ * constfalse in 'partclauseinfo' to inform the caller that we found such
+ * clause.
+ */
+static void
+remove_redundant_clauses(PartitionPruneContext *context,
+ PartitionClauseInfo *partclauseinfo,
+ List **minimalclauses)
+{
+ PartClause *hash_clause,
+ *btree_clauses[BTMaxStrategyNumber];
+ ListCell *lc;
+ int s;
+ int i;
+ bool test_result;
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *keyclauses = partclauseinfo->keyclauses[i];
+
+ minimalclauses[i] = NIL;
+ hash_clause = NULL;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+
+ foreach(lc, keyclauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+
+ if (!pc->valid_cache)
+ {
+ Oid lefttype;
+
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &pc->op_subtype);
+ fmgr_info(get_opcode(pc->opno), &pc->op_func);
+ pc->valid_cache = true;
+ }
+
+ /*
+ * Hash-partitioning knows only about equality. So, if we've
+ * matched a clause and found another clause whose constant
+ * operand doesn't match the constant operand of the former, then
+ * we have found mutually contradictory clauses.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ if (hash_clause == NULL)
+ hash_clause = pc;
+ /* check if another clause would contradict the one we have */
+ else if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ pc, pc, hash_clause,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ }
+ /*
+ * Couldn't compare; keep hash_clause set to the previous value,
+ * and add this one directly to the result. Caller would
+ * arbitrarily choose one of the many and perform
+ * partition-pruning with it.
+ */
+ else
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+
+ /*
+ * The code below handles btree operators, so not relevant for
+ * hash partitioning.
+ */
+ continue;
+ }
+
+ /*
+ * The code that follows closely mimics similar processing done by
+ * nbtutils.c: _bt_preprocess_keys().
+ *
+ * btree_clauses[s] points currently best clause containing the
+ * operator strategy type s+1; it is NULL if we haven't yet found
+ * such a clause.
+ */
+ s = pc->op_strategy - 1;
+ if (btree_clauses[s] == NULL)
+ {
+ btree_clauses[s] = pc;
+ }
+ else
+ {
+ /*
+ * Is this one more restrictive than what we already have?
+ *
+ * Consider some examples: 1. If btree_clauses[BTLT] now contains
+ * a < 5, and pc is a < 3, then because 3 < 5 is true, a < 5
+ * currently at btree_clauses[BTLT] will be replaced by a < 3.
+ *
+ * 2. If btree_clauses[BTEQ] now contains a = 5 and pc is a = 7,
+ * then because 5 = 7 is false, we found a mutual contradiction,
+ * so we set *constfalse to true and return.
+ *
+ * 3. If btree_clauses[BTLT] now contains a < 5 and pc is a < 7,
+ * then because 7 < 5 is false, we leave a < 5 where it is and
+ * effectively discard a < 7 as being redundant.
+ */
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ pc, pc, btree_clauses[s],
+ &test_result))
+ {
+ /* pc is more restrictive, so replace the existing. */
+ if (test_result)
+ btree_clauses[s] = pc;
+ else if (s == BTEqualStrategyNumber - 1)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+
+ /* Old one is more restrictive, so keep around. */
+ }
+ else
+ {
+ /*
+ * We couldn't determine which one is more restrictive. Keep
+ * the previous one in btree_clauses[s] and push this one directly
+ * to the output list.
+ */
+ minimalclauses[i] = lappend(minimalclauses[i], pc);
+ }
+ }
+ }
+
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ {
+ /* Note we didn't add this one to the result yet. */
+ if (hash_clause)
+ minimalclauses[i] = lappend(minimalclauses[i], hash_clause);
+ continue;
+ }
+
+ /* Compare btree operator clauses across strategies. */
+
+ /* Compare the equality clause with clauses of other strategies. */
+ if (btree_clauses[BTEqualStrategyNumber - 1])
+ {
+ PartClause *eq = btree_clauses[BTEqualStrategyNumber - 1];
+
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ PartClause *chk = btree_clauses[s];
+
+ if (!chk || s == (BTEqualStrategyNumber - 1))
+ continue;
+
+ /*
+ * Suppose btree_clauses[BTLT] contained a < 5 and the eq clause
+ * is a = 5, then because 5 < 5 is false, we found contradiction.
+ * That is, a < 5 and a = 5 are mutually contradictory. OTOH, if
+ * eq clause is a = 3, then because 3 < 5, we no longer need
+ * a < 5, because a = 3 is more restrictive.
+ */
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ chk, eq, chk,
+ &test_result))
+ {
+ if (!test_result)
+ {
+ partclauseinfo->constfalse = true;
+ return;
+ }
+ /* Discard the no longer needed clause. */
+ btree_clauses[s] = NULL;
+ }
+ }
+ }
+
+ /*
+ * Try to keep only one of <, <=.
+ *
+ * Suppose btree_clauses[BTLT] contains a < 3 and btree_clauses[BTLE]
+ * contains a <= 3 (or a <= 4), then because 3 <= 3 (or 3 <= 4) is true,
+ * we discard the a <= 3 (or a <= 4) as redundant. If the latter contains
+ * contains a <= 2, then because 3 <= 2 is false, we discard a < 3 as
+ * redundant.
+ */
+ if (btree_clauses[BTLessStrategyNumber - 1] &&
+ btree_clauses[BTLessEqualStrategyNumber - 1])
+ {
+ PartClause *lt = btree_clauses[BTLessStrategyNumber - 1],
+ *le = btree_clauses[BTLessEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ le, lt, le,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTLessEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTLessStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /* Try to keep only one of >, >=. See the example above. */
+ if (btree_clauses[BTGreaterStrategyNumber - 1] &&
+ btree_clauses[BTGreaterEqualStrategyNumber - 1])
+ {
+ PartClause *gt = btree_clauses[BTGreaterStrategyNumber - 1],
+ *ge = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ if (partition_cmp_args(context->partopcintype[i],
+ context->partopfamily[i],
+ ge, gt, ge,
+ &test_result))
+ {
+ if (test_result)
+ btree_clauses[BTGreaterEqualStrategyNumber - 1] = NULL;
+ else
+ btree_clauses[BTGreaterStrategyNumber - 1] = NULL;
+ }
+ }
+
+ /*
+ * btree_clauses now contains the "best" clause or NULL for each btree
+ * strategy number. Add to the newlist.
+ */
+ for (s = 0; s < BTMaxStrategyNumber; s++)
+ {
+ if (btree_clauses[s])
+ minimalclauses[i] = lappend(minimalclauses[i],
+ btree_clauses[s]);
+ }
+ }
+}
+
+/*
+ * partition_cmp_args
+ * Try to compare the constant arguments of 'leftarg' and 'rightarg', in
+ * that order, using the operator of 'op' and set *result to the result
+ * of this comparison.
+ *
+ * Returns true if we could actually perform the comparison; otherwise false.
+ *
+ * Note: We may not be able to perform the comparison if operand values are
+ * unknown in this context or if the type of any of the operands are
+ * incompatible with the operator.
+ */
+static bool
+partition_cmp_args(Oid partopcintype, Oid partopfamily,
+ PartClause *pc, PartClause *leftarg, PartClause *rightarg,
+ bool *result)
+{
+ Datum left_value;
+ Datum right_value;
+
+ Assert(pc->valid_cache && leftarg->valid_cache && rightarg->valid_cache);
+
+ /*
+ * Try to extract an actual value from each arg. This may fail if the
+ * value is unknown in this context, in which case we cannot compare.
+ */
+ if (!partkey_datum_from_expr(partopcintype, leftarg->value, &left_value))
+ return false;
+
+ if (!partkey_datum_from_expr(partopcintype, rightarg->value, &right_value))
+ return false;
+
+ /*
+ * We can compare left_value and right_value using op's operator
+ * only if both are of the expected type.
+ */
+ if (leftarg->op_subtype == pc->op_subtype &&
+ rightarg->op_subtype == pc->op_subtype)
+ {
+ *result = DatumGetBool(FunctionCall2Coll(&pc->op_func,
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ else
+ {
+ Oid cmp_op;
+
+ /* Otherwise, look one up in the partitioning operator family. */
+ cmp_op = get_opfamily_member(partopfamily,
+ leftarg->op_subtype,
+ rightarg->op_subtype,
+ pc->op_strategy);
+ if (OidIsValid(cmp_op))
+ {
+ *result = DatumGetBool(OidFunctionCall2Coll(get_opcode(cmp_op),
+ pc->inputcollid,
+ left_value,
+ right_value));
+ return true;
+ }
+ }
+
+ /* Couldn't do the comparison. */
+ *result = false;
+ return false;
+}
+
+/*
+ * extract_bounding_datums
+ * Process 'clauseinfo' and populate 'keys' with all
+ * min/max/equal/not-equal values that we're able to
+ * determine.
+ *
+ * *minimalclauses is an array with partnatts members, each of which is a list
+ * of the most restrictive clauses of each operator strategy for the given
+ * partition key.
+ *
+ * For RANGE partitioning we do not need to match and find values for all
+ * partition keys. We may be able to eliminate some partitions with just a
+ * prefix of the partition keys. HASH partitioning does require all keys are
+ * matched to with at least some combinations of equality clauses and IS NULL
+ * clauses. LIST partitions don't support multiple partition keys.
+ *
+ * Returns true if at least one key was found; false otherwise.
+ */
+static bool
+extract_bounding_datums(PartitionPruneContext *context,
+ PartitionClauseInfo *clauseinfo,
+ List **minimalclauses, PartScanKeyInfo *keys)
+{
+ bool need_next_eq,
+ need_next_min,
+ need_next_max;
+ int i;
+ ListCell *lc;
+
+ /*
+ * Based on the strategies of the clauses' operators (=, </<=, >/>=), try
+ * to construct a tuple of those datums that serve as the exact lookup
+ * tuple or two tuples that serve as minimum and maximum bound.
+ *
+ * If we find datums for all partition key columns that appear in =
+ * operator clauses, then we have the exact match lookup tuple, which will
+ * be used to match just one partition (although that's required only for
+ * range partitioning, finding datums for just some columns is fine for
+ * hash partitioning).
+ *
+ * If the last datum in a tuple comes from a clause containing </<= or
+ * >/>= operator, then that constitutes the minimum or maximum bound tuple,
+ * respectively. There is one exception -- if we have a tuple containing
+ * values for only a prefix of partition key columns, where none of its
+ * values come from a </<= or >/>= operator clause, we still consider such
+ * tuple as both minimum and maximum bound tuple.
+ */
+ need_next_eq = true;
+ need_next_min = true;
+ need_next_max = true;
+ memset(keys, 0, sizeof(PartScanKeyInfo));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ List *clauselist = minimalclauses[i];
+
+ /*
+ * Min and max keys must constitute a prefix of the partition key and
+ * must appear in the same order as partition keys. Equal keys have
+ * to satisfy that requirement only for non-hash partitioning.
+ */
+ if (i > keys->n_eqkeys &&
+ context->strategy != PARTITION_STRATEGY_HASH)
+ need_next_eq = false;
+
+ if (i > keys->n_minkeys)
+ need_next_min = false;
+
+ if (i > keys->n_maxkeys)
+ need_next_max = false;
+
+ foreach(lc, clauselist)
+ {
+ PartClause *clause = (PartClause *) lfirst(lc);
+ Expr *value = clause->value;
+ bool incl;
+ PartOpStrategy op_strategy;
+
+ op_strategy = partition_op_strategy(context->strategy, clause,
+ &incl);
+ switch (op_strategy)
+ {
+ case PART_OP_EQUAL:
+ Assert(incl);
+ if (need_next_eq &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->eqkeys[i]))
+ keys->n_eqkeys++;
+
+ if (need_next_max &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = true;
+ }
+
+ if (need_next_min &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = true;
+ }
+ break;
+
+ case PART_OP_LESS:
+ if (need_next_max &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->maxkeys[i]))
+ {
+ keys->n_maxkeys++;
+ keys->max_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_max = false;
+ }
+ break;
+
+ case PART_OP_GREATER:
+ if (need_next_min &&
+ partkey_datum_from_expr(context->partopcintype[i],
+ value, &keys->minkeys[i]))
+ {
+ keys->n_minkeys++;
+ keys->min_incl = incl;
+ if (!incl)
+ need_next_eq = need_next_min = false;
+ }
+ break;
+
+ default:
+ Assert(false);
+ }
+ }
+ }
+
+ /*
+ * To set eqkeys, we must have found matching clauses containing =
+ * operator for all partition key columns and if present we don't need
+ * the values in minkeys and maxkeys anymore. In the case hash
+ * partitioning, we don't require all of eqkeys to be operator clauses.
+ * In that case, any IS NULL clauses involving partition key columns are
+ * also considered as equality keys by the code for hash partition pruning,
+ * which checks that all partition columns are covered before actually
+ * performing the pruning.
+ */
+ if (keys->n_eqkeys == context->partnatts ||
+ context->strategy == PARTITION_STRATEGY_HASH)
+ keys->n_minkeys = keys->n_maxkeys = 0;
+ else
+ keys->n_eqkeys = 0;
+
+ /* Collect datums from <> operator clauses in its dedicated array. */
+ if (clauseinfo->ne_clauses)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ keys->ne_datums = (Datum *)
+ palloc(list_length(clauseinfo->ne_clauses) *
+ sizeof(Datum));
+ i = 0;
+ foreach(lc, clauseinfo->ne_clauses)
+ {
+ PartClause *pc = (PartClause *) lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context->partopcintype[0],
+ pc->value, &datum))
+ keys->ne_datums[i++] = datum;
+ }
+ keys->n_ne_datums = i;
+ }
+
+ /* Finally, also set the keyisnull and keyisnotnull values. */
+ keys->keyisnull = clauseinfo->keyisnull;
+ keys->keyisnotnull = clauseinfo->keyisnotnull;
+
+ return (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
+ keys->n_maxkeys > 0 || keys->n_ne_datums > 0 ||
+ !bms_is_empty(keys->keyisnull) ||
+ !bms_is_empty(keys->keyisnotnull));
+}
+
+/*
+ * partition_op_strategy
+ * Returns whether the clause in 'pc' contains an =, </<=, or >/>=
+ * operator and set *incl to true if the operator's strategy is
+ * inclusive.
+ */
+static PartOpStrategy
+partition_op_strategy(char part_strategy, PartClause *pc, bool *incl)
+{
+ *incl = false; /* may be overwritten below */
+
+ switch (part_strategy)
+ {
+ /* Hash partitioning allows only hash equality. */
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy == HTEqualStrategyNumber)
+ {
+ *incl = true;
+ return PART_OP_EQUAL;
+ }
+ elog(ERROR, "unexpected operator strategy number: %d",
+ pc->op_strategy);
+
+ /* List and range partitioning support all btree operators. */
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTLessStrategyNumber:
+ return PART_OP_LESS;
+
+ case BTEqualStrategyNumber:
+ *incl = true;
+ return PART_OP_EQUAL;
+
+ case BTGreaterEqualStrategyNumber:
+ *incl = true;
+ /* fall through */
+
+ case BTGreaterStrategyNumber:
+ return PART_OP_GREATER;
+ }
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) part_strategy);
+ }
+
+ return PART_OP_EQUAL; /* keep compiler quiet */
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(Oid partopcintype, Expr *expr, Datum *value)
+{
+ Oid exprtype = exprType((Node *) expr);
+
+ if (exprtype != partopcintype)
+ {
+ ParseState *pstate = make_parsestate(NULL);
+
+ expr = (Expr *) coerce_to_target_type(pstate, (Node *) expr,
+ exprtype,
+ partopcintype, -1,
+ COERCION_EXPLICIT,
+ COERCE_IMPLICIT_CAST, -1);
+ free_parsestate(pstate);
+
+ /*
+ * If we couldn't coerce to the partition key's type, that is, the
+ * type of the datums stored in PartitionBoundInfo for this partition
+ * key, there's no hope of using this expression for anything
+ * partitioning-related.
+ */
+ if (expr == NULL)
+ return false;
+
+ /*
+ * Transform into a form that the following code can do something
+ * useful with.
+ */
+ expr = evaluate_expr(expr,
+ exprType((Node *) expr),
+ exprTypmod((Node *) expr),
+ exprCollation((Node *) expr));
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ if (IsA(expr, Const))
+ {
+ *value = ((Const *) expr)->constvalue;
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index ac85a79023..4dcca8ba58 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,22 +1256,32 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run each expression through const-simplification and
- * canonicalization similar to check constraints.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- pcqual = (List *) canonicalize_qual((Expr *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run each expression through const-simplification and
+ * canonicalization similar to check constraints.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ pcqual = (List *) canonicalize_qual((Expr *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1856,6 +1865,11 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..4e9281d3d5 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,91 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
+/*
+ * PartScanKeyInfo
+ * Information about partition look up keys to be passed to
+ * get_partitions_for_keys()
+ *
+ * Stores Datums and nullness properties found in clauses which match the
+ * partition key. Datum arrays eqkeys, minkeys, and maxkeys are indexed by
+ * partition key number, whereas ne_datums is not. Bitmapsets keyisnull and
+ * keyisnotnull have a bit for each partition key.
+ */
+typedef struct PartScanKeyInfo
+{
+ /*
+ * Equality look up key. Used to store known Datums values from clauses
+ * compared by an equality operation to the partition key.
+ */
+ Datum eqkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Lower and upper bounds on a sequence of selected partitions. These may
+ * contain values for only a prefix of the partition keys.
+ */
+ Datum minkeys[PARTITION_MAX_KEYS];
+ Datum maxkeys[PARTITION_MAX_KEYS];
+
+ /*
+ * Number of values stored in the corresponding array above
+ */
+ int n_eqkeys;
+ int n_minkeys;
+ int n_maxkeys;
+
+ /*
+ * Properties to mark if the clauses corresponding to the datums stored in
+ * minkeys and maxkeys, respectively, are inclusive of the stored value or
+ * not.
+ */
+ bool min_incl;
+ bool max_incl;
+
+ /*
+ * Datum values from clauses containing <> operator. Note that, unlike
+ * the arrays above, the following array is not indexed by partition
+ * key. We only ever use this array for list partitioning and there
+ * can only be one partition key in that case anyway.
+ */
+ Datum *ne_datums;
+ int n_ne_datums;
+
+ /*
+ * Information about nullness of the partition keys, either specified
+ * explicitly in the query (in the form of a IS [NOT] NULL clause) or
+ * implied from strict clauses matching the partition key.
+ */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartScanKeyInfo;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +158,8 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ PartScanKeyInfo *keys);
+
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 08a177dac4..b687924443 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -663,6 +665,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h
index ba4fa4b68b..3c2f54964b 100644
--- a/src/include/optimizer/clauses.h
+++ b/src/include/optimizer/clauses.h
@@ -84,5 +84,7 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node);
extern Query *inline_set_returning_function(PlannerInfo *root,
RangeTblEntry *rte);
+extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
+ Oid result_collation);
#endif /* CLAUSES_H */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..b654691e9b
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,71 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+/*
+ * Stores clauses which were matched to a partition key.
+ *
+ * Each matching "operator" clause is stored in the 'keyclauses' list for the
+ * partition key that it was matched to, except if the operator is <>, in
+ * which case, the clause is added to the 'ne_clauses' list.
+ *
+ * Boolean OR clauses whose at least one argument clause matches a partition
+ * key are added to the 'or_clauses' list.
+ *
+ * Based on an IS NULL or IS NOT NULL clause that was matched to a partition
+ * key, the corresponding bit in 'keyisnull' or 'keyisnotnull' is set. A bit
+ * in 'keyisnotnull' may also be set when a strict OpExpr is encountered for
+ * the given partition key.
+ */
+typedef struct PartitionClauseInfo
+{
+ /* Lists of clauses indexed by the partition key */
+ List *keyclauses[PARTITION_MAX_KEYS];
+
+ /* Each member is a List itself of a given OR clause's arguments. */
+ List *or_clauses;
+
+ /*
+ * Each member is a List itself of PartitionClauseInfos for the arguments
+ * of a given OR clause. Both this and or_clauses should be iterated
+ * together using forboth() macro.
+ */
+ List *or_partclauseinfos;
+
+ /* List of clauses containing <> operator. */
+ List *ne_clauses;
+
+ /* Nth (0 <= N < partnatts) bit set if the key is NULL or NOT NULL. */
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+
+ /* True if at least one of above fields contains valid information. */
+ bool foundkeyclauses;
+
+ /* True if mutually contradictory clauses were found. */
+ bool constfalse;
+} PartitionClauseInfo;
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern PartitionClauseInfo *generate_partition_clauses(
+ PartitionPruneContext *context,
+ List *clauses);
+extern Bitmapset *get_partitions_from_clauses(PartitionPruneContext *context,
+ PartitionClauseInfo *partclauseinfo);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index a79f891da7..11a259ca25 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1715,11 +1715,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1906,11 +1902,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 348719bd62..948cad4c3d 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -519,15 +517,13 @@ explain (costs off) select * from rlp where a <= 31;
Filter: (a <= 31)
-> Seq Scan on rlp5_1
Filter: (a <= 31)
- -> Seq Scan on rlp5_default
- Filter: (a <= 31)
-> Seq Scan on rlp_default_10
Filter: (a <= 31)
-> Seq Scan on rlp_default_30
Filter: (a <= 31)
-> Seq Scan on rlp_default_default
Filter: (a <= 31)
-(29 rows)
+(27 rows)
explain (costs off) select * from rlp where a = 1 or a = 7;
QUERY PLAN
@@ -575,9 +571,7 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_2
Filter: ((a > 20) AND (a < 27))
- -> Seq Scan on rlp4_default
- Filter: ((a > 20) AND (a < 27))
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -651,8 +645,6 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
QUERY PLAN
-------------------------------------------------------------------
Append
- -> Seq Scan on rlp2
- Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3abcd
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3efgh
@@ -661,7 +653,7 @@ explain (costs off) select * from rlp where (a = 1 and a = 3) or (a > 1 and a =
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-> Seq Scan on rlp3_default
Filter: (((a = 1) AND (a = 3)) OR ((a > 1) AND (a = 15)))
-(11 rows)
+(9 rows)
-- multi-column keys
create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
@@ -716,9 +708,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -894,6 +884,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -904,7 +896,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -965,9 +957,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1009,24 +1003,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1036,33 +1026,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1088,4 +1067,411 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p0 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p1 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(5 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(3 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 514f8e5ce1..08fc2dbc21 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,104 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work in all cases below
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune;
--
2.11.0
v35-0003-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v35-0003-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From a70b5bd53e551a308236c57f57bea4569d7b0c35 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v35 3/3] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 107 insertions(+), 220 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 266a3ef8ef..169c697c08 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2260,21 +2260,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5040,9 +5025,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index bbffc87842..2021c085d5 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3185,9 +3175,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 011d2a3fa9..fe309a6b54 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4072,9 +4063,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 542c4a2bca..08570ce25d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index de1257d9c2..4b5713f2f8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -556,7 +556,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -571,6 +570,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1113,12 +1113,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1190,10 +1190,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1424,6 +1426,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1524,6 +1530,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1531,7 +1552,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5937,65 +5958,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..c097da6425 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -260,7 +260,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b687924443..1d801b226f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -671,6 +675,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2123,27 +2128,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
On Thu, Mar 1, 2018 at 6:57 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/02/28 19:14, Ashutosh Bapat wrote:
On Wed, Feb 28, 2018 at 6:42 AM, Amit Langote wrote:
BTW, should there be a relevant test in partition_join.sql? If yes,
attached a patch (partitionwise-join-collation-test-1.patch) to add one.A partition-wise join path will be created but discarded because of
higher cost. This test won't see it in that case. So, please add some
data like other tests and add command to analyze the partitioned
tables. That kind of protects from something like that.Thanks for the review.
Hmm, the added test is such that the partition collations won't match, so
partition-wise join won't be considered at all due to differing
PartitionSchemes, unless I'm missing something.
The point is we wouldn't know whether PWJ was not selected because of
PartitionScheme mismatch OR the PWJ paths were expensive compared to
non-PWJ as happens with empty tables. In both the cases we will see a
non-PWJ "plan" although in one case PWJ was not possible and in the
other it was possible. I think what we want to test is that PWJ Is not
possible with collation mismatch.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Tue, Feb 27, 2018 at 4:33 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached an updated version in which I incorporated some of the revisions
that David Rowley suggested to OR clauses handling (in partprune.c) that
he posted as a separate patch on the run-time pruning thread [1].
I'm very skeptical about this patch's desire to remove the static
qualifier from evaluate_expr(). Why does this patch need that and
constraint exclusion not need it? Why should this patch not instead
by using eval_const_expressions? partkey_datum_from_expr() is
prepared to give up if evaluate_expr() doesn't return a Const, but
there's nothing in evaluate_expr() to make it give up if, for example,
the input is -- or contains -- a volatile function, e.g. random().
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc)))
+ rel->has_default_part = true;
+ else
+ rel->has_default_part = false;
This can be written a lot more compactly as rel->has_default_part =
OidIsValid(get_default_oid_from_partdesc(partdesc));
PartitionPruneContext has no comment explaining its general purpose; I
think it should.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Feb 28, 2018 at 11:53 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached updated patches.
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc,
+ sizeof(FmgrInfo) * partnatts);
You can't copy an FmgrInfo by just applying memcpy() to it. Use fmgr_info_copy.
I don't like the comments at the top of partprune.c very much. It
seems strange to document individual functions here; those functions
can (and should) be documented in their individual header comments.
What we should have at the top of the file is a discussion of the
overall theory of operation of this module, something that currently
seems not to exist anywhere in the patch. I tried to figure that out
by looking at the new data structures the patch introduces:
PartitionPruneContext, PartScanKeyInfo, PartitionClauseInfo, and
PartClause. It looks like the general idea idea is that code that
wants to use these facilities does the following:
Step 1. Generate a PartitionPruneContext. In this patch, this seems
to consist entirely of copying information from the RelOptInfo or its
PartitionScheme.
Step 2. Call generate_partition_clauses() to extract relevant clauses
from rel->baserestrictinfo and generate a PartClauseInfo.
Step 3. Call get_partitions_from_clauses() to generate a set of
unpruned partition indexes. Internally, that function will first
populate a PartScanKeyInfo from the PartClauseInfo by calling
extract_bounding_datums(). Then it calls get_partitions_for_keys()
which generates the set of unpruned partition indexes from the
PartitionPruneContext and PartScanKeyInfo.
I guess there are two main things about this that seem odd to me:
1. I don't see why the partition pruning code should ever be
responsible for evaluating anything itself, as it does currently via
evaluate_expr(). For plan-time partition pruning, we are already
using eval_const_expressions() to perform as much Const-simplification
as possible. If we see an OpExpr with a partitioning column on one
side, then the other side is either a Const, in which case we can
perform pruning, or it's something whose value can't be known until
runtime, in which case we cannot. There should be no case in which
eval_const_expressions() fails to simplify something to a constant yet
we know the value at plan time; if such a case exists, then it's an
opportunity to improve eval_const_expressions(), not a reason to do be
inconsistent with the rules it applies. For run-time pruning, it is
probably semantically wrong and certainly undesirable from a
performance perspective to spin up a new ExecutorState and a new
ExprState every time we need to reassess the decision about which
partitions need to be scanned. Instead, the expressions whose values
are inputs to the pruning process should be computed in the
ExecutorState/ExprState for the main query and the results should be
passed to the partition-pruning code as inputs to the decision-making
process.
2. All processing of clauses should happen at plan time, not run time,
so that we're not repeating work. If, for example, a prepared query
is executed repeatedly, the clauses should get fully processed when
it's planned, and then the only thing that should happen at execution
time is that we take the values to which we now have access and use
them to decide what to prune. With this design, we're bound to repeat
at least the portion of the work done by get_partitions_from_clauses()
at runtime, and as things stand, it looks like the current version of
the run-time partition pruning patch repeats the
generate_partition_clauses() work at runtime as well.
It seems to me as though what we ought to be doing is extracting
clauses that are of the correct general form and produces a list of
<partition-column-index>, <partition-operator-strategy>, <expression>
triplets sorted by increasing partition column index and operator
strategy number and in a form that can be represented as a Node tree.
This can be attached to the plan for use at runtime or, if there are
at least some constants among the expressions, used at plan time for
partition exclusion. So the main interfaces would be something like
this:
extern List *extract_partition_pruning_clauses(RelOptInfo *rel);
extern PartitionPruneContext *CreatePartitionPruneContext(List *);
extern Bitmapset *PerformStaticPartitionPruning(PartitionPruneContext *);
extern Bitmapset *PerformDynamicPartitionPruning(PartitionPruneContext
*, Datum *values, bool *isnull);
In PerformStaticPartitionPruning(), we'd just ignore anything for
which the expression was non-constant; to call
PerformDynamicPartitionPruning(), the caller would need to evaluate
the expressions from the PartitionPruneContext using the appropriate
EState and then pass the results into this function.
I realize that I'm hand-waving a bit here, or maybe more than a bit,
but in general I think it's right to imagine that the work here needs
to be segregated into two very well-separated phases. In the first
phase, we do all possible work to assess which clauses are relevant
and put them in a form where the actual pruning work can be done as
cheaply as possible once we know the values. This phase always
happens at plan time. In the second phase, we get the values, either
by extracting them from Const nodes at plan time or evaluating
expressions at runtime, and then decide which partitions to eliminate.
This phase happens at plan time if we have enough constants available
to do something useful and at runtime if we have enough non-constants
to do something useful. Right now there doesn't seem to be a clean
separation between these two phases and I think that's not good.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2 March 2018 at 08:13, Robert Haas <robertmhaas@gmail.com> wrote:
I don't like the comments at the top of partprune.c very much. It
seems strange to document individual functions here; those functions
can (and should) be documented in their individual header comments.
What we should have at the top of the file is a discussion of the
overall theory of operation of this module, something that currently
seems not to exist anywhere in the patch. I tried to figure that out
by looking at the new data structures the patch introduces:
PartitionPruneContext, PartScanKeyInfo, PartitionClauseInfo, and
PartClause. It looks like the general idea idea is that code that
wants to use these facilities does the following:Step 1. Generate a PartitionPruneContext. In this patch, this seems
to consist entirely of copying information from the RelOptInfo or its
PartitionScheme.
Step 2. Call generate_partition_clauses() to extract relevant clauses
from rel->baserestrictinfo and generate a PartClauseInfo.
Step 3. Call get_partitions_from_clauses() to generate a set of
unpruned partition indexes. Internally, that function will first
populate a PartScanKeyInfo from the PartClauseInfo by calling
extract_bounding_datums(). Then it calls get_partitions_for_keys()
which generates the set of unpruned partition indexes from the
PartitionPruneContext and PartScanKeyInfo.
Hi Robert,
I feel I should step in here and answer this part as it was me who
first came up with the idea of the context struct. I've typed up
something below which is my first cut at what I'd have imagined the
header comment of partprune.c should look like. Some parts are only
revant after run-time pruning is also using this stuff. I've tried to
highlight those areas, I'm not sure how much or if there should be any
mention of that at all as part of this patch.
Here goes:
partprune.c
Allows efficient identification of the minimal set of partitions which match a
given set of clauses. Thus allowing useful things such as ignoring unneeded
partitions which cannot possibly contain tuples matching the given set of
clauses.
This module breaks the process of determining the matching partitions into
two distinct steps, each of which has its own function which is externally
visible outside of this module. The reason for not performing everything
in one step as down to the fact that there are times where we may wish to
perform the 2nd step multiple times over. The steps could be thought of as a
compilation step followed by an execution step.
Step 1 (compilation):
Pre-process the given list of clauses and attempt to match individual clauses
up to a partition key.
The end result of this process is a PartitionClauseInfo containing details of
each clause found to match the partition key. This output is required as
input for the 2nd step.
Step 2 (execution):
Step 2 outputs the minimal set of matching partitions based on the input from
step 1.
Internally, this step is broken down into smaller sub-steps, each of which
is explained in detail in the comments in the corresponding function.
Step 2 can be executed multiple times for its input values. The inputs to this
step are not modified by the processing done within. It is expected that this
step is executed multiple times in cases where the matching partitions must be
determined during query execution. A subsequent evaluation of this step will
be required whenever a parameter which was found in a clause matching the
partition key changes its value.
PartitionPruneContext:
Each of the steps described above also requires an input of a
PartitionPruneContext. This stores all of the remaining required inputs to
each step. The context will vary slightly depending on the context in which
the step is being called from; i.e the planner or executor. For example,
during query planning, we're unable to determine the value of a Param found
matching the partition key. When this step is called from the executor the
PlanState can be set in the context which allows evaluation of these Params
into Datum values. *** Only after run-time pruning ***
The PartitionPruneContext is also required since many of the query planner
node types are unavailable to the executor, which means that the source
information used to populate the context will vary depending on if it's being
called from the query planner or executor.
*** Only after run-time pruning ***
The context is also modified during step 1 to record all of the Param IDs
which were found to match the partition key.
-------------
Hopefully that also helps explain the intensions with the current code strucure.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/03/01 21:56, Robert Haas wrote:
On Tue, Feb 27, 2018 at 4:33 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Attached an updated version in which I incorporated some of the revisions
that David Rowley suggested to OR clauses handling (in partprune.c) that
he posted as a separate patch on the run-time pruning thread [1].I'm very skeptical about this patch's desire to remove the static
qualifier from evaluate_expr(). Why does this patch need that and
constraint exclusion not need it? Why should this patch not instead
by using eval_const_expressions? partkey_datum_from_expr() is
prepared to give up if evaluate_expr() doesn't return a Const, but
there's nothing in evaluate_expr() to make it give up if, for example,
the input is -- or contains -- a volatile function, e.g. random().
Thinking on this a bit, I have removed the evaluate_expr() business from
partkey_datum_from_expr() and thus switched evaluate_expr() back to static.
Let me explain why I'd added there in the first place -- if the constant
expression received in partkey_datum_from_expr() was not of the same type
as that of the partition key, it'd try to coerce_to_target_type() the
input expression to the partition key type which may result in a non-Const
expression. We'd turn it back into a Const by calling evaluate_expr(). I
thought the coercion was needed because we'd be comparing the resulting
datum with the partition bound datums using a partition comparison
function that would require its arguments to be of given types.
But I realized we don't need the coercion. Earlier steps would have
determined that the clause from which the expression originated contains
an operator that is compatible with the partitioning operator family. If
so, the type of the expression in question, even though different from the
partition key type, would be binary coercible with it. So, it'd be okay
to pass the datum extracted from such expression to the partition
comparison function to compare it with datums in PartitionBoundInfo,
without performing any coercion.
+ if (OidIsValid(get_default_oid_from_partdesc(partdesc))) + rel->has_default_part = true; + else + rel->has_default_part = false;This can be written a lot more compactly as rel->has_default_part =
OidIsValid(get_default_oid_from_partdesc(partdesc));
Indeed, will fix.
PartitionPruneContext has no comment explaining its general purpose; I
think it should.
Will fix.
Thanks,
Amit
Thanks for your comments.
On 2018/03/02 4:13, Robert Haas wrote:
On Wed, Feb 28, 2018 at 11:53 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Attached updated patches.
+ memcpy(part_scheme->partsupfunc, partkey->partsupfunc, + sizeof(FmgrInfo) * partnatts);You can't copy an FmgrInfo by just applying memcpy() to it. Use fmgr_info_copy.
Oops, will fix.
I don't like the comments at the top of partprune.c very much. It
seems strange to document individual functions here; those functions
can (and should) be documented in their individual header comments.
What we should have at the top of the file is a discussion of the
overall theory of operation of this module, something that currently
seems not to exist anywhere in the patch.
Sorry about that. Thanks to your comments here and David's reply, I will
try to address that in the next version.
I tried to figure that out
by looking at the new data structures the patch introduces:
PartitionPruneContext, PartScanKeyInfo, PartitionClauseInfo, and
PartClause. It looks like the general idea idea is that code that
wants to use these facilities does the following:Step 1. Generate a PartitionPruneContext. In this patch, this seems
to consist entirely of copying information from the RelOptInfo or its
PartitionScheme.
Step 2. Call generate_partition_clauses() to extract relevant clauses
from rel->baserestrictinfo and generate a PartClauseInfo.
Step 3. Call get_partitions_from_clauses() to generate a set of
unpruned partition indexes. Internally, that function will first
populate a PartScanKeyInfo from the PartClauseInfo by calling
extract_bounding_datums(). Then it calls get_partitions_for_keys()
which generates the set of unpruned partition indexes from the
PartitionPruneContext and PartScanKeyInfo.I guess there are two main things about this that seem odd to me:
1. I don't see why the partition pruning code should ever be
responsible for evaluating anything itself, as it does currently via
evaluate_expr(). For plan-time partition pruning, we are already
using eval_const_expressions() to perform as much Const-simplification
as possible. If we see an OpExpr with a partitioning column on one
side, then the other side is either a Const, in which case we can
perform pruning, or it's something whose value can't be known until
runtime, in which case we cannot. There should be no case in which
eval_const_expressions() fails to simplify something to a constant yet
we know the value at plan time; if such a case exists, then it's an
opportunity to improve eval_const_expressions(), not a reason to do be
inconsistent with the rules it applies. For run-time pruning, it is
probably semantically wrong and certainly undesirable from a
performance perspective to spin up a new ExecutorState and a new
ExprState every time we need to reassess the decision about which
partitions need to be scanned. Instead, the expressions whose values
are inputs to the pruning process should be computed in the
ExecutorState/ExprState for the main query and the results should be
passed to the partition-pruning code as inputs to the decision-making
process.
As I said in my earlier reply, I have removed the part that involved the
pruning code calling evaluate_expr().
2. All processing of clauses should happen at plan time, not run time,
so that we're not repeating work. If, for example, a prepared query
is executed repeatedly, the clauses should get fully processed when
it's planned, and then the only thing that should happen at execution
time is that we take the values to which we now have access and use
them to decide what to prune. With this design, we're bound to repeat
at least the portion of the work done by get_partitions_from_clauses()
at runtime, and as things stand, it looks like the current version of
the run-time partition pruning patch repeats the
generate_partition_clauses() work at runtime as well.It seems to me as though what we ought to be doing is extracting
clauses that are of the correct general form and produces a list of
<partition-column-index>, <partition-operator-strategy>, <expression>
triplets sorted by increasing partition column index and operator
strategy number and in a form that can be represented as a Node tree.
This can be attached to the plan for use at runtime or, if there are
at least some constants among the expressions, used at plan time for
partition exclusion. So the main interfaces would be something like
this:extern List *extract_partition_pruning_clauses(RelOptInfo *rel);
extern PartitionPruneContext *CreatePartitionPruneContext(List *);
extern Bitmapset *PerformStaticPartitionPruning(PartitionPruneContext *);
extern Bitmapset *PerformDynamicPartitionPruning(PartitionPruneContext
*, Datum *values, bool *isnull);In PerformStaticPartitionPruning(), we'd just ignore anything for
which the expression was non-constant; to call
PerformDynamicPartitionPruning(), the caller would need to evaluate
the expressions from the PartitionPruneContext using the appropriate
EState and then pass the results into this function.I realize that I'm hand-waving a bit here, or maybe more than a bit,
but in general I think it's right to imagine that the work here needs
to be segregated into two very well-separated phases. In the first
phase, we do all possible work to assess which clauses are relevant
and put them in a form where the actual pruning work can be done as
cheaply as possible once we know the values. This phase always
happens at plan time. In the second phase, we get the values, either
by extracting them from Const nodes at plan time or evaluating
expressions at runtime, and then decide which partitions to eliminate.
This phase happens at plan time if we have enough constants available
to do something useful and at runtime if we have enough non-constants
to do something useful. Right now there doesn't seem to be a clean
separation between these two phases and I think that's not good.
Hmm, I see that things can be improved and various points you've mentioned
here to improve the high-level interfaces seem useful. I'll try to rework
the patch based on those and submit one early next week.
Looking at the rough interface sketch in your message, it seems that the
product of whatever steps we end up grouping into phase 1 should be
something that can be put into a Node tree (PartitionPruneContext?),
because we may need to attach it to the plan if some of the needed values
will only be made available during execution.
Given the patch's implementation, we'll have to make the structure of that
Node tree a bit more complex than a simple List. For one thing, the patch
handles OR clauses by performing pruning separately for each arm and them
combining partitions selected across OR arms using set union. By
"performing pruning" in the last sentence I meant following steps similar
to ones you wrote in your message:
1. Segregating pruning clauses into per-partition-key Lists, that is,
generate_partition_clauses() producing a PartitionClauseInfo,
2. Removing redundant clauses from each list, that is,
remove_redundant_clauses() to produce lists with just one member per
operator strategy for each partition key,
3. Extracting Datum values from the clauses to form equal/min/max tuples
and setting null or not null bits for individual keys, that is,
extract_bounding_datums() producing a PartScanKeyInfo, and
4. Finally pruning with those Datum tuples and null/not null info, that
is, get_partitions_for_keys().
Steps 2-4 are dependent on clauses providing Datums, which all the clauses
may or may not do. Depending on whether or not, we'll have to defer those
steps to run time.
So,
* What do we encode into the Node tree attached to the plan? Clauses that
haven't gone through steps 2 and 3 (something like PartitionClauseInfo)
or the product of step 3 (something like PartScanKeyInfo)?
* How do we account for OR clauses? Perhaps by having the aforementioned
Node trees nested inside the top-level one, wherein there will be one
nested node per arm of an OR clause.
Thanks,
Amit
On 2018/03/02 11:12, David Rowley wrote:
On 2 March 2018 at 08:13, Robert Haas <robertmhaas@gmail.com> wrote:
I don't like the comments at the top of partprune.c very much. It
seems strange to document individual functions here; those functions
can (and should) be documented in their individual header comments.
What we should have at the top of the file is a discussion of the
overall theory of operation of this module, something that currently
seems not to exist anywhere in the patch. I tried to figure that out
by looking at the new data structures the patch introduces:
PartitionPruneContext, PartScanKeyInfo, PartitionClauseInfo, and
PartClause. It looks like the general idea idea is that code that
wants to use these facilities does the following:Step 1. Generate a PartitionPruneContext. In this patch, this seems
to consist entirely of copying information from the RelOptInfo or its
PartitionScheme.
Step 2. Call generate_partition_clauses() to extract relevant clauses
from rel->baserestrictinfo and generate a PartClauseInfo.
Step 3. Call get_partitions_from_clauses() to generate a set of
unpruned partition indexes. Internally, that function will first
populate a PartScanKeyInfo from the PartClauseInfo by calling
extract_bounding_datums(). Then it calls get_partitions_for_keys()
which generates the set of unpruned partition indexes from the
PartitionPruneContext and PartScanKeyInfo.Hi Robert,
I feel I should step in here and answer this part as it was me who
first came up with the idea of the context struct. I've typed up
something below which is my first cut at what I'd have imagined the
header comment of partprune.c should look like. Some parts are only
revant after run-time pruning is also using this stuff. I've tried to
highlight those areas, I'm not sure how much or if there should be any
mention of that at all as part of this patch.Here goes:
partprune.c
Allows efficient identification of the minimal set of partitions which match a
given set of clauses. Thus allowing useful things such as ignoring unneeded
partitions which cannot possibly contain tuples matching the given set of
clauses.This module breaks the process of determining the matching partitions into
two distinct steps, each of which has its own function which is externally
visible outside of this module. The reason for not performing everything
in one step as down to the fact that there are times where we may wish to
perform the 2nd step multiple times over. The steps could be thought of as a
compilation step followed by an execution step.Step 1 (compilation):
Pre-process the given list of clauses and attempt to match individual clauses
up to a partition key.The end result of this process is a PartitionClauseInfo containing details of
each clause found to match the partition key. This output is required as
input for the 2nd step.Step 2 (execution):
Step 2 outputs the minimal set of matching partitions based on the input from
step 1.Internally, this step is broken down into smaller sub-steps, each of which
is explained in detail in the comments in the corresponding function.Step 2 can be executed multiple times for its input values. The inputs to this
step are not modified by the processing done within. It is expected that this
step is executed multiple times in cases where the matching partitions must be
determined during query execution. A subsequent evaluation of this step will
be required whenever a parameter which was found in a clause matching the
partition key changes its value.PartitionPruneContext:
Each of the steps described above also requires an input of a
PartitionPruneContext. This stores all of the remaining required inputs to
each step. The context will vary slightly depending on the context in which
the step is being called from; i.e the planner or executor. For example,
during query planning, we're unable to determine the value of a Param found
matching the partition key. When this step is called from the executor the
PlanState can be set in the context which allows evaluation of these Params
into Datum values. *** Only after run-time pruning ***The PartitionPruneContext is also required since many of the query planner
node types are unavailable to the executor, which means that the source
information used to populate the context will vary depending on if it's being
called from the query planner or executor.*** Only after run-time pruning ***
The context is also modified during step 1 to record all of the Param IDs
which were found to match the partition key.-------------
Hopefully that also helps explain the intensions with the current code strucure.
Thanks David for writing this down.
Thanks,
Amit
On Fri, Mar 2, 2018 at 1:22 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
But I realized we don't need the coercion. Earlier steps would have
determined that the clause from which the expression originated contains
an operator that is compatible with the partitioning operator family. If
so, the type of the expression in question, even though different from the
partition key type, would be binary coercible with it.
That doesn't follow. Knowing that two types are in the same operator
family doesn't guarantee that the types are binary coercible. For
example, int8 is not binary-coercible to int2. Moreover, you'd better
be pretty careful about trying to cast int8 to int2 because it might
turn a query that would have returned no rows into one that fails
outright; that's not OK. Imagine that the user types:
SELECT * FROM partitioned_by_int2 WHERE a = 1000000000000;
I think what needs to happen with cross-type situations is that you
look in the opfamily for a comparator that takes the types you want as
input; if you can't find one, you have to give up on pruning. If you
do find one, then you use it. For example in the above query, once
you find btint28cmp, you can use that to compare the user-provided
constant against the range bounds for the various partitions to see
which one might contain it. You'll end up selecting the partition
with upper bound MAXVALUE if there is one, or no partition at all if
every partition has a finite upper bound. That's as well as we can do
with current infrastructure, I think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Mar 2, 2018 at 6:21 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
Looking at the rough interface sketch in your message, it seems that the
product of whatever steps we end up grouping into phase 1 should be
something that can be put into a Node tree (PartitionPruneContext?),
because we may need to attach it to the plan if some of the needed values
will only be made available during execution.
Right. You might also end up with two representations: a
Node-tree-style representation that contains all of the information we
need, and another, faster form into which it gets converted before
use.
Given the patch's implementation, we'll have to make the structure of that
Node tree a bit more complex than a simple List. For one thing, the patch
handles OR clauses by performing pruning separately for each arm and them
combining partitions selected across OR arms using set union. By
"performing pruning" in the last sentence I meant following steps similar
to ones you wrote in your message:1. Segregating pruning clauses into per-partition-key Lists, that is,
generate_partition_clauses() producing a PartitionClauseInfo,2. Removing redundant clauses from each list, that is,
remove_redundant_clauses() to produce lists with just one member per
operator strategy for each partition key,3. Extracting Datum values from the clauses to form equal/min/max tuples
and setting null or not null bits for individual keys, that is,
extract_bounding_datums() producing a PartScanKeyInfo, and4. Finally pruning with those Datum tuples and null/not null info, that
is, get_partitions_for_keys().Steps 2-4 are dependent on clauses providing Datums, which all the clauses
may or may not do. Depending on whether or not, we'll have to defer those
steps to run time.
I don't see that there's a real need to perform step 2 at all. I
mean, if you have x > $1 and x > $2 in the query, you can just compute
the set of partitions for the first clause, compute the set of
partitions for the second clause, and then intersect. That doesn't
seem obviously worse than deciding which of $1 and $2 is greater and
then pruning only based on whichever one is greater in this case.
* What do we encode into the Node tree attached to the plan? Clauses that
haven't gone through steps 2 and 3 (something like PartitionClauseInfo)
or the product of step 3 (something like PartScanKeyInfo)?* How do we account for OR clauses? Perhaps by having the aforementioned
Node trees nested inside the top-level one, wherein there will be one
nested node per arm of an OR clause.
Suppose we define the notion of a pruning program. A pruning program
can use any number of registers, which have integer numbers starting
with 0 and counting upward as high as necessary. Each register holds
a Bitmapset. The result of a pruning program is the value of register
0 when the program completes. A pruning program consists of a list of
steps, each of which is either a PruningBaseStep or a
PruningCombineStep. A PruningCombineStep modifies the contents of the
target register based on the contents of a source register in one of
the following three ways: (1) UNION -- all bits set in source become
set in target; (2) INTERSECT -- all bits clear in source become clear
in target; (3) DIFFERENCE -- all bits set in source become clear in
target. A PruningBaseStep consists of a strategy (equality,
less-than, etc.), an output register, and list of expressions --
either as many as there are partition keys, or for range partitioning
perhaps fewer; it prunes based on the strategy and the expressions and
overwrites the output register with the partitions that would be
selected.
Example #1. Table is hash-partitioned on a and b. Given a query like
SELECT * FROM tab WHERE a = 100 AND b = 233, we create a single-step
program:
1. base-step (strategy =, register 0, expressions 100, 233)
If there were an equality constraint on one of the two columns, we
would not create a pruning program at all, because no pruning is
possible.
Example #2. Table is list-partitioned on a. Given a query like SELECT
* FROM tab WHERE (a = $1 OR a = $2) AND a != $3, we create this
program:
1. base-step (strategy =, register 0, expressions $1)
2. base-step (strategy =, register 1, expressions $2)
3. base-step (strategy !=, register 2, expressions $3)
4. combine-step (target-register 0, source-register 1, strategy union)
5. combine-step (target-register 0, source-register 2, strategy difference)
(This is unoptimized -- one could do better by reversing steps 3 and 4
and using reusing register 1 instead of needing register 2, but that
kind of optimization is probably not too important.)
Example #3. Table is range-partitioned on a and b. Given a query like
SELECT * FROM tab WHERE (a = 40 AND b > $1) OR (a = $2 AND b = $3), we
do this:
1. base-step (strategy >, register 0, expressions 40, $1)
2. base-step (strategy =, register 1, expressions $2, $3)
3. combine-step (target-register 0, source-register 1, strategy union)
You might need a few extra gadgets here to make all of this work --
e.g. another base-step strategy to handle ScalarArrayOpExpr; I'm just
trying to convey the basic idea here. It's pretty easy to see how to
store a program like this as a node tree: just create PruningBaseStep
and PruningCombineStep nodes and stick them into a List. At execution
time transform the List into an array and loop over it.
Or possibly it would be better to have two lists, one of base steps
without explicit register numbers, where step N always outputs to
register N, and then a second list of combine steps. Then at
execution time you could have an array of PruningBaseStep * and an
array of PruningCombineStep * instead of a combined array of Node *,
which might be quicker to process.
But regardless of what you do exactly, I think you should try to come
up with some kind of representation that is basically uniform,
handling all the things you support in a similar fashion. The current
patch has basically separate and somewhat ad-hoc representations for
the regular case, the <> case, and the OR case, which I think is not
ideal because you end up with more code and a certain amount of
repeated logic.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2 March 2018 at 08:13, Robert Haas <robertmhaas@gmail.com> wrote:
2. All processing of clauses should happen at plan time, not run time,
so that we're not repeating work. If, for example, a prepared query
is executed repeatedly, the clauses should get fully processed when
it's planned, and then the only thing that should happen at execution
time is that we take the values to which we now have access and use
them to decide what to prune. With this design, we're bound to repeat
at least the portion of the work done by get_partitions_from_clauses()
at runtime, and as things stand, it looks like the current version of
the run-time partition pruning patch repeats the
generate_partition_clauses() work at runtime as well.It seems to me as though what we ought to be doing is extracting
clauses that are of the correct general form and produces a list of
<partition-column-index>, <partition-operator-strategy>, <expression>
triplets sorted by increasing partition column index and operator
strategy number and in a form that can be represented as a Node tree.
This can be attached to the plan for use at runtime or, if there are
at least some constants among the expressions, used at plan time for
partition exclusion. So the main interfaces would be something like
this:
We did try this already as I also thought the same a while back and
even wrote a broken patch to do that. The thing is,
PartitionClauseInfo is designed to be optimal for having
get_partitions_from_clauses() called on it multiple times over, as
will likely happen when run-time pruning is pruning away unneeded
partitioned during a Parameterized nested loop join. To make it a Node
type, it's not quite as simple as changing arrays to Lists as the
keyclauses List contains PartClause, which are also not a Node type,
and that struct contains a FmgrInfo, which is also not a node type, so
we'd need to go try to make all those Node types (I doubt that's going
to happen) ... but ...
It's probably not impossible to come up with some intermediate
partially processed type that can be a Node type. Ideally, this could
be quickly converted into a PartitionClauseInfo during execution. The
problem with this is that this adds a rather silly additional step
during query planning to convert the intermediate processed list into
the fully processed PaitionClauseInfo that get_partitions_from_clauses
would still need.
I don't think building it is going to cost a huge amount. Presumably,
there are not many partitioned tables with 10 rows, so probably having
the get_partitions_from_clauses work as quickly as possible is better
than saving 100 nanoseconds in executor startup.
That being said, there's still a small issue with the run-time pruning
patch which is caused by me not pre-processing the clauses during
planning. Ideally, I'd be able to pre-process at least enough to
determine if any Params match the partition key so that I know if
run-time pruning can be used or not. As of now, I'm not doing that as
it seems wasteful to pre-process during planning just to get the Param
Ids out, then not be able to carry the pre-processed ones over to the
executor. We also can't really reuse the pre-processed state that was
generated during the planner's calls to generate_partition_clauses()
since we'll additionally also be passing in the parameterized path
clauses as well as the baserestrictinfo clauses.
Given the above, I'm happy to listen to ideas on this, as I'm really
not sure what's best here.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 3 March 2018 at 04:47, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 2, 2018 at 6:21 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:Looking at the rough interface sketch in your message, it seems that the
product of whatever steps we end up grouping into phase 1 should be
something that can be put into a Node tree (PartitionPruneContext?),
because we may need to attach it to the plan if some of the needed values
will only be made available during execution.Right. You might also end up with two representations: a
Node-tree-style representation that contains all of the information we
need, and another, faster form into which it gets converted before
use.
hmm, I thought Amit typed PartitionPruneContext instead of
PartitionClauseInfo by mistake here.
PartitionPruneContext can't be a node type. The run-time pruning patch
needs to set the PlanState in this context so that the code can lookup
the Param values when being called from the executor. You can't make
PlanState a node type too!
Perhaps you can make some primnode type to store all the stuff from
RelOptInfo that's currently being stored in PartitionPruneContext.
That could go along with the plan to save the executor having to look
that stuff up. We can then make that Node type a field in the
PartitionPruneContext. You could even reuse that Node from when it
would first get generated during query planning pruning and keep it
around for use to pass to the executor for run-time pruning, but you'd
probably need to stuff it somewhere like the partition's RelOptInfo so
it could be reused again later. Unsure if that's worth the trouble or
not.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Mar 2, 2018 at 10:54 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
I don't think building it is going to cost a huge amount. Presumably,
there are not many partitioned tables with 10 rows, so probably having
the get_partitions_from_clauses work as quickly as possible is better
than saving 100 nanoseconds in executor startup.
I agree that one could go overboard with trying to push work from
executor time to planner time, but I don't think the current patch is
very close to the point of diminishing returns. It's doing nearly
everything at execution time.
That being said, there's still a small issue with the run-time pruning
patch which is caused by me not pre-processing the clauses during
planning. Ideally, I'd be able to pre-process at least enough to
determine if any Params match the partition key so that I know if
run-time pruning can be used or not. As of now, I'm not doing that as
it seems wasteful to pre-process during planning just to get the Param
Ids out, then not be able to carry the pre-processed ones over to the
executor. We also can't really reuse the pre-processed state that was
generated during the planner's calls to generate_partition_clauses()
since we'll additionally also be passing in the parameterized path
clauses as well as the baserestrictinfo clauses.
I think it should be possible to have a structure where all the work
of classifying clauses happens in the planner. By the time we get to
execution time, we should be able to know for sure which clauses are
relevant. For example, if the user says WHERE a = $1 + 3 AND b =
(random() * 100)::int, and the partition key is (a, b), we should be
able to figure out at plan time that the clause containing b is
useless (because it's volatile) and the clause containing a is useful
only if this is range-partitioning (because with hash-partitioning we
must have an equality clause for every partition to do anything). I
think it should also be possible to know which expressions need to be
computed at runtime -- in this case, $1 + 3 -- and to which columns of
the partition key they correspond -- in this case, the first. I just
proposed a data representation which could track all that stuff and
I'm sure there are other ways to do it, too.
I think that things like PartClause that include both an opno and
various bits of cached information, including FmgrInfo, are not a very
good idea. A lot of work has been done to maintain the separation of
immutable information -- like Plans or Exprs -- from the run-time
state they use -- PlanState or ExprState. I think we would do well to
follow that distinction here, too, even if it seems to introduce some
"silly" overhead at execution time. I think it will pay for itself in
future code maintenance and the ability to apply optimizations such as
JIT which benefit from good divisions in this case. It is not crazy
to imagine that the "pruning program" idea I floated in a previous
email could be folded into the JIT stuff Andres is doing where
something with a less-clean separation of concerns would run into
problems.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 3 March 2018 at 04:47, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 2, 2018 at 6:21 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:2. Removing redundant clauses from each list, that is,
remove_redundant_clauses() to produce lists with just one member per
operator strategy for each partition key,I don't see that there's a real need to perform step 2 at all. I
mean, if you have x > $1 and x > $2 in the query, you can just compute
the set of partitions for the first clause, compute the set of
partitions for the second clause, and then intersect. That doesn't
seem obviously worse than deciding which of $1 and $2 is greater and
then pruning only based on whichever one is greater in this case.
This is an interesting idea. It may simplify the code quite a bit as
the clause reduction code is quite bulky. However, unless I'm
mistaken, for this to work, certain inputs will need significantly
more processing to determine the minimum set of matching partitions.
Let's look at the following perhaps unlikely case. (I picked an
extreme case to demonstrate why this may be an inferior method)
Given the table abc (...) partition by range (a,b,c), with the query:
select * from abc where a >= 1 and a >= 2 and a >= 3 and b >= 1 and b
= 2 and b = 3 and c >= 1 and c >= 2 and c = 3;
We would likely still be parsing those clauses into some struct like
PartitionClauseInfo and would end up with some arrays or Lists with
the clauses segmented by partition key.
It appears to me, for your method to work we'd need to try every
combination of the clauses matching each partition key, which in this
case is 3 * 3 * 3 searches. Amit's current method is 1 search, after
the clause reduction which is 3 + 3 + 3 (O(N) per partition key)
I've tried to think of a more genuine poor performing case for this
with IN or NOT IN lists, but I can't quite see it since NOT IN will
only be supported by LIST partitioning, which can only have a single
partition key and IN would be OR conditions, each of which would be
evaluated in a different round of looping. Although I'm not ruling out
that my imagination is just not good enough.
With that considered, is it still a good idea to do it this way?
Or maybe I've misunderstood the idea completely?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/03/02 21:43, Robert Haas wrote:
On Fri, Mar 2, 2018 at 1:22 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:But I realized we don't need the coercion. Earlier steps would have
determined that the clause from which the expression originated contains
an operator that is compatible with the partitioning operator family. If
so, the type of the expression in question, even though different from the
partition key type, would be binary coercible with it.That doesn't follow. Knowing that two types are in the same operator
family doesn't guarantee that the types are binary coercible. For
example, int8 is not binary-coercible to int2. Moreover, you'd better
be pretty careful about trying to cast int8 to int2 because it might
turn a query that would have returned no rows into one that fails
outright; that's not OK. Imagine that the user types:SELECT * FROM partitioned_by_int2 WHERE a = 1000000000000;
I think what needs to happen with cross-type situations is that you
look in the opfamily for a comparator that takes the types you want as
input; if you can't find one, you have to give up on pruning. If you
do find one, then you use it. For example in the above query, once
you find btint28cmp, you can use that to compare the user-provided
constant against the range bounds for the various partitions to see
which one might contain it. You'll end up selecting the partition
with upper bound MAXVALUE if there is one, or no partition at all if
every partition has a finite upper bound. That's as well as we can do
with current infrastructure, I think.
Hmm, yes.
So while the patch's previous approach to convert the query's constant
value to the desired type was wrong, this is wronger. :-(
I guess I'll need to change the patch such that the comparison function
used for comparing partition bounds with a query-specified constant will
change from the default one from the PartitionKey to the one that accepts
the latter.
Thanks,
Amit
On 2018/03/03 0:47, Robert Haas wrote:
On Fri, Mar 2, 2018 at 6:21 AM, Amit Langote wrote:
Given the patch's implementation, we'll have to make the structure of that
Node tree a bit more complex than a simple List. For one thing, the patch
handles OR clauses by performing pruning separately for each arm and them
combining partitions selected across OR arms using set union. By
"performing pruning" in the last sentence I meant following steps similar
to ones you wrote in your message:1. Segregating pruning clauses into per-partition-key Lists, that is,
generate_partition_clauses() producing a PartitionClauseInfo,2. Removing redundant clauses from each list, that is,
remove_redundant_clauses() to produce lists with just one member per
operator strategy for each partition key,3. Extracting Datum values from the clauses to form equal/min/max tuples
and setting null or not null bits for individual keys, that is,
extract_bounding_datums() producing a PartScanKeyInfo, and4. Finally pruning with those Datum tuples and null/not null info, that
is, get_partitions_for_keys().Steps 2-4 are dependent on clauses providing Datums, which all the clauses
may or may not do. Depending on whether or not, we'll have to defer those
steps to run time.I don't see that there's a real need to perform step 2 at all. I
mean, if you have x > $1 and x > $2 in the query, you can just compute
the set of partitions for the first clause, compute the set of
partitions for the second clause, and then intersect. That doesn't
seem obviously worse than deciding which of $1 and $2 is greater and
then pruning only based on whichever one is greater in this case.
If we can accommodate this step with your "pruning program" idea below, I
think we should still try to keep the remove_redundant_clauses() step.
If we can keep this step, x > $1 and x > $1 will require 1 comparison + 1
bsearch over bounds, whereas without it, it'd be 2 bsearches over bounds +
1 Bitmapset operation. I think David expressed a similar concern about
the performance in his reply down-thread.
That too as long as we implement this step such that having it in the
pipeline doesn't restrict the representation that we need to have the
"pruning steps" in. Let's see.
* What do we encode into the Node tree attached to the plan? Clauses that
haven't gone through steps 2 and 3 (something like PartitionClauseInfo)
or the product of step 3 (something like PartScanKeyInfo)?* How do we account for OR clauses? Perhaps by having the aforementioned
Node trees nested inside the top-level one, wherein there will be one
nested node per arm of an OR clause.Suppose we define the notion of a pruning program. A pruning program
can use any number of registers, which have integer numbers starting
with 0 and counting upward as high as necessary. Each register holds
a Bitmapset. The result of a pruning program is the value of register
0 when the program completes. A pruning program consists of a list of
steps, each of which is either a PruningBaseStep or a
PruningCombineStep. A PruningCombineStep modifies the contents of the
target register based on the contents of a source register in one of
the following three ways: (1) UNION -- all bits set in source become
set in target; (2) INTERSECT -- all bits clear in source become clear
in target; (3) DIFFERENCE -- all bits set in source become clear in
target. A PruningBaseStep consists of a strategy (equality,
less-than, etc.), an output register, and list of expressions --
either as many as there are partition keys, or for range partitioning
perhaps fewer; it prunes based on the strategy and the expressions and
overwrites the output register with the partitions that would be
selected.Example #1. Table is hash-partitioned on a and b. Given a query like
SELECT * FROM tab WHERE a = 100 AND b = 233, we create a single-step
program:1. base-step (strategy =, register 0, expressions 100, 233)
If there were an equality constraint on one of the two columns, we
would not create a pruning program at all, because no pruning is
possible.Example #2. Table is list-partitioned on a. Given a query like SELECT
* FROM tab WHERE (a = $1 OR a = $2) AND a != $3, we create this
program:1. base-step (strategy =, register 0, expressions $1)
2. base-step (strategy =, register 1, expressions $2)
3. base-step (strategy !=, register 2, expressions $3)
4. combine-step (target-register 0, source-register 1, strategy union)
5. combine-step (target-register 0, source-register 2, strategy difference)(This is unoptimized -- one could do better by reversing steps 3 and 4
and using reusing register 1 instead of needing register 2, but that
kind of optimization is probably not too important.)Example #3. Table is range-partitioned on a and b. Given a query like
SELECT * FROM tab WHERE (a = 40 AND b > $1) OR (a = $2 AND b = $3), we
do this:1. base-step (strategy >, register 0, expressions 40, $1)
2. base-step (strategy =, register 1, expressions $2, $3)
3. combine-step (target-register 0, source-register 1, strategy union)You might need a few extra gadgets here to make all of this work --
e.g. another base-step strategy to handle ScalarArrayOpExpr; I'm just
trying to convey the basic idea here. It's pretty easy to see how to
store a program like this as a node tree: just create PruningBaseStep
and PruningCombineStep nodes and stick them into a List. At execution
time transform the List into an array and loop over it.Or possibly it would be better to have two lists, one of base steps
without explicit register numbers, where step N always outputs to
register N, and then a second list of combine steps. Then at
execution time you could have an array of PruningBaseStep * and an
array of PruningCombineStep * instead of a combined array of Node *,
which might be quicker to process.But regardless of what you do exactly, I think you should try to come
up with some kind of representation that is basically uniform,
handling all the things you support in a similar fashion. The current
patch has basically separate and somewhat ad-hoc representations for
the regular case, the <> case, and the OR case, which I think is not
ideal because you end up with more code and a certain amount of
repeated logic.
Thanks for outlining this idea.
Given that the most important part in your outline seems to be the clean
Node structure to carry the information from one stage to another, that
will result by adopting the "partition pruning primitives" described
above, I will first try to implement them as a shell around the code that
I and David have debugged till now. Then, once everything seems to work,
I will start dropping unneeded structures and code that seems to be
duplicative, which I'm beginning to suspect there will be plenty of. I'll
post an update in a couple of days to report on how that works out.
Regards,
Amit
On Fri, Mar 2, 2018 at 7:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
Let's look at the following perhaps unlikely case. (I picked an
extreme case to demonstrate why this may be an inferior method)Given the table abc (...) partition by range (a,b,c), with the query:
select * from abc where a >= 1 and a >= 2 and a >= 3 and b >= 1 and b
= 2 and b = 3 and c >= 1 and c >= 2 and c = 3;
We would likely still be parsing those clauses into some struct like
PartitionClauseInfo and would end up with some arrays or Lists with
the clauses segmented by partition key.It appears to me, for your method to work we'd need to try every
combination of the clauses matching each partition key, which in this
case is 3 * 3 * 3 searches. Amit's current method is 1 search, after
the clause reduction which is 3 + 3 + 3 (O(N) per partition key)
[...]
With that considered, is it still a good idea to do it this way?
I dunno. What do you think?
That case is indeed pretty unfortunate, but it's also pretty
artificial. It's not obvious to me that we shouldn't care about it,
but it's also not obvious to me that we should. If we have some
bizarre cases that slip through the cracks or don't perform terribly
well, maybe nobody would ever notice or care. On the other hand,
maybe they would.
I suppose in my ideal world, this could be handled by building a
GREATEST or LEAST expression. In other words, if someone says foo >=
1 AND foo >= 2, instead of doing separate pruning steps, we'd just
prune once based on foo >= GREATEST(1,2). But that doesn't really
work, because there's no provision to tell MinMaxExpr from which
opfamily we wish to draw the operator used to compare 1 and 2 and no
guarantee that such an operator exists for the actual data types of 1
and 2. (Imagine that 1 and 2 of different data types; the relevant
opfamily might have an operator that can compare a value of the same
type as foo to 1 and similarly for 2, but no operator that can compare
1 and 2 to each other.)
One thing that we could do is just only accept one clause for each
column-strategy pairing, presumably either the first one or the last
one. So in your example either a >= 1 or a >= 3 would be accepted and
the others would be discarded for purposes of partition pruning. If a
user complains, we could tell THEM to manually do the rewrite
suggested in the previous step, and just write a >= GREATEST(1,2,3).
(And of course if it's that simple, they might want to then
pre-simplify to a >= 3!)
Another alternative is to include some kind of additional type of step
in the "pruning program" which can do this GREATEST/LEAST operation
... but that's adding quite a bit of complexity for what seems like
it's pretty much a corner case, and as noted above, there's no
guarantee that we even have the correct operator available. It should
be fine if the partitioning column/expression and all of the constants
being compared are of the same type, and in practice *most* of the
time even when they're not, but we're going to have to have some
handling for the strange cases -- and I think the only real choices
are "try every combination and maybe be slow", "try 1 combination and
maybe fail to prune everything that could have been pruned", and some
intermediate possibilities along the same lines.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 7 March 2018 at 10:15, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 2, 2018 at 7:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:It appears to me, for your method to work we'd need to try every
combination of the clauses matching each partition key, which in this
case is 3 * 3 * 3 searches. Amit's current method is 1 search, after
the clause reduction which is 3 + 3 + 3 (O(N) per partition key)[...]
With that considered, is it still a good idea to do it this way?
I dunno. What do you think?
That case is indeed pretty unfortunate, but it's also pretty
artificial. It's not obvious to me that we shouldn't care about it,
but it's also not obvious to me that we should. If we have some
bizarre cases that slip through the cracks or don't perform terribly
well, maybe nobody would ever notice or care. On the other hand,
maybe they would.
One thing I've learned in my time working with PostgreSQL is that, if
there's a known hole, someone's probably going to fall down it
eventually. I like working with PostgreSQL because we're pretty
careful to not make holes that people can fall down, or if there is
some hole that cannot be filled in, we try to put a fence around it
with a sign, (e.g rename pg_xlog to pg_wal). I'm not strongly opposed
to your ideas, I probably don't have a complete understanding of the
idea anyway. But from what I understand it looks like you want to take
something that works quite well and make it work less well, and there
appears not to be a good reason provided of why you want to do that.
Is it because you want to simplify the patch due to concerns about it
being too much logic to get right for PG11?
One thing that we could do is just only accept one clause for each
column-strategy pairing, presumably either the first one or the last
one.
The problem with that is it can cause surprising behaviour. We reorder
clauses and clauses get pushed down from upper parts of the query.
Let's say there was some view like:
CREATE VIEW vw_ledger_2018 AS SELECT * FROM ledger WHERE postdate
BETWEEN '2018-01-01' AND '2018-12-13';
And a user comes along and does:
SELECT * FROM vw_ledger_2018 WHERE postdate BETWEEN '2018-03-01' AND
'2018-03-31'
We're going to end up with base quals something like: postdate >=
'2018-01-01' AND postdate <= '2018-12-31' AND postdate >= '2018-03-01'
AND postdate <= '2018-03-31'
If we just take the first from each op strategy then we'll not have
managed to narrow the case down to just the March partition. You might
argue that this should be resolved at some higher level in the
planner, but that does nothing for the run-time pruning case.
I don't really want to do or say anything that jeopardises this patch
from getting into PG11, so if the path of least resistance is to go
with the option you've proposed then I'd much rather that than this
getting pushed out to PG12. I really just want to try to make sure
we've thought of everything before we create too many surprises for
users.
Perhaps a compromise would be to check all quals from the first
partition key and only the first or last one from the remaining keys.
I imagine most cases will have just 1 key anyway. This would
significantly reduce the number of possible combinations of quals to
try, but unfortunately, it still does have that element of surprise.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 6, 2018 at 8:34 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
One thing I've learned in my time working with PostgreSQL is that, if
there's a known hole, someone's probably going to fall down it
eventually. I like working with PostgreSQL because we're pretty
careful to not make holes that people can fall down, or if there is
some hole that cannot be filled in, we try to put a fence around it
with a sign, (e.g rename pg_xlog to pg_wal). I'm not strongly opposed
to your ideas, I probably don't have a complete understanding of the
idea anyway. But from what I understand it looks like you want to take
something that works quite well and make it work less well, and there
appears not to be a good reason provided of why you want to do that.Is it because you want to simplify the patch due to concerns about it
being too much logic to get right for PG11?
My understanding is that the patch as submitted is fundamentally
broken in multiple ways.
As Amit said a few emails upthread, "So while the patch's previous
approach to convert the query's constant value to the desired type was
wrong, this is wronger. :-(" I agree with that analysis. As I tried
to explain in my last email, if you've got something like foo >
'blarfle'::type1 and foo > 'hoge'::type2, there may actually be no way
at all to determine which of those clauses is more restrictive. The
fact that > was used in the query to compare foo with a value of type1
and, separately, with a value of type2 means that those operators
exist, but it does not follow that the opfamily provides an operator
which can compare type1 to type2. As far as I can see, what this
means is that, in general, the approach the patch takes to eliminating
redundant clauses just doesn't work; and in the general case I don't
think there's much hope of saving it. The question of whether the
patch does too much work at execution time or not is maybe arguable --
my position is that it does -- but working properly in the face of
cross-type comparisons is non-negotiable.
The use of evaluate_expr() is also completely wrong and has got to be
fixed. I already wrote about that upthread so I won't repeat it here.
I'm pretty sure that the current design, if allowed to stand, would
have lots of bad consequences.
As I understand it, Amit is currently hacking on the patch to try to
fix these issues. If he comes up with something that works properly
with cross-type comparisons and doesn't abuse evaluate_expr() but
still does more work than I'd ideally prefer at execution time, I'll
consider holding my nose and consider it anyway. But considering the
amount of rework that I think is needed, I don't really see why we
wouldn't adopt a design that minimizes execution time work, too.
In short, I don't think I'm trying to make something that works quite
well work less well, because I don't think the patch as it stands can
be correctly described as working quite well.
Let's say there was some view like:
CREATE VIEW vw_ledger_2018 AS SELECT * FROM ledger WHERE postdate
BETWEEN '2018-01-01' AND '2018-12-13';And a user comes along and does:
SELECT * FROM vw_ledger_2018 WHERE postdate BETWEEN '2018-03-01' AND
'2018-03-31'If we just take the first from each op strategy then we'll not have
managed to narrow the case down to just the March partition. You might
argue that this should be resolved at some higher level in the
planner, but that does nothing for the run-time pruning case.
Yeah, that's a good example of how this could happen in real life.
So far I see three ways forward here:
1. If we've got multiple redundant quals, ignore all but one of them
for purposes of partition pruning. Hope users don't get mad.
2. If we've got multiple redundant quals, do multiple checks. Hope
this isn't annoyingly slow (or too much ugly code).
3. If we've got multiple redundant quals but no cross-type operators
are in use, evaluate all of the expressions and pick the highest or
lowest value as appropriate. Otherwise fall back to #1 or #2. For
bonus points, do this when cross-type operators ARE in use and the
additional cross-type operators that we need to figure out the highest
or lowest value, as appropriate, is also available.
I'm OK with any of those approaches; that is, I will not seek to block
a merely patch on the basis of which of those options it chooses. I
think they are all defensible. Options that are not OK include (a)
trying to cast a value of one type to another type, because that could
turn a query that would have simply returned no rows into an error
case or (b) supposing that all types in an opfamily are binary
coercible to each other, because that's just plain wrong.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi.
On 2018/03/05 17:38, Amit Langote wrote:
I'll
post an update in a couple of days to report on how that works out.
I'm still working on this and getting most of the tests to pass with the
new code, but not all of them yet.
Thanks,
Amit
On 2018/03/07 20:58, Amit Langote wrote:
Hi.
On 2018/03/05 17:38, Amit Langote wrote:
I'll
post an update in a couple of days to report on how that works out.I'm still working on this and getting most of the tests to pass with the
new code, but not all of them yet.
Sorry about the delay.
Attached is a significantly revised version of the patch, although I admit
it could still use some work with regard to comments and other cleanup.
The rewrite introduces a notion of PartitionPruneStep nodes based on the
ideas described in [1]/messages/by-id/CA+TgmoahUxagjeNeJTcJkD0rbk+mHTXROzWcEd+tZ8DuQG83cg@mail.gmail.com. So, instead of aggregating *all* of the pruning
clauses into a PartitionClauseInfo which was hard to serialize into a node
tree and then a PartScanKeyInfo (both of which no longer exist), this
generates a list of nodes. Each node inherits from the base
PartitionPruneStep node type and contains information enough to perform
partition pruning by directly comparing the information with partition
bounds or contains sub-nodes that do. For example, a PartitionPruneStepOp
step contains an integer telling the partitioning operator strategy (such
as various btree operator strategies) and a tuple to compare against
partition bounds stored in the relcache. A PartitionPruneStepCombine step
contains arguments that are in turn pruning steps themselves, which are
separately executed and partition sets obtained thereby are combined using
the specified combineOp.
Also, fixed a bug of the previous design as detailed in [2]/messages/by-id/CA+TgmoYtKitwsFtA4+6cdeYGEfnS1+OY+G=Ue26fgSzJZx=eJg@mail.gmail.com. So, with the
patch:
create table lparted (a smallint) partition by list (a);
create table lparted_1 partition of lparted for values in (1);
create table lparted_16384 partition of lparted for values in (16384);
-- all partitions pruned (lparted_16384 wouldn't be pruned by previous
-- patches due to comparison using bogus a partsupfunc)
explain (costs off) select * from lparted where a = 100000000000000;
QUERY PLAN
--------------------------
Result
One-Time Filter: false
(2 rows)
Also,
create table rparted (a smallint) partition by range (a);
create table rparted_1 partition of rparted for values from (1) to (10);
create table rparted_16384 partition of rparted for values from (10) to
(16384);
create table rparted_maxvalue partition of rparted for values from (16384)
to (maxvalue);
-- all partitions except rparted_maxvalue pruned
explain (costs off) select * from rparted where a > 100000000000000;
QUERY PLAN
-------------------------------------------------
Append
-> Seq Scan on rparted_maxvalue
Filter: (a > '100000000000000'::bigint)
(3 rows)
I will continue working on improving the comments / cleaning things up and
post a revised version soon, but until then please look at the attached.
Thanks,
Amit
[1]: /messages/by-id/CA+TgmoahUxagjeNeJTcJkD0rbk+mHTXROzWcEd+tZ8DuQG83cg@mail.gmail.com
/messages/by-id/CA+TgmoahUxagjeNeJTcJkD0rbk+mHTXROzWcEd+tZ8DuQG83cg@mail.gmail.com
[2]: /messages/by-id/CA+TgmoYtKitwsFtA4+6cdeYGEfnS1+OY+G=Ue26fgSzJZx=eJg@mail.gmail.com
/messages/by-id/CA+TgmoYtKitwsFtA4+6cdeYGEfnS1+OY+G=Ue26fgSzJZx=eJg@mail.gmail.com
Attachments:
v36-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v36-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 6452a6fff1b3c6ef8aaecc35ad6c9a164958d87a Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v36 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index bd3a0c4a0a..709a00924e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d576aa7350..08a177dac4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v36-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v36-0002-Add-more-tests-for-partition-pruning.patchDownload
From f66d86c23dd3eb6f2ada87653f22bc87f6c7811f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v36 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 476 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 120 ++++++-
2 files changed, 594 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..775bba6547 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,478 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b IS NULL))
+ -> Seq Scan on hp2
+ Filter: ((a IS NULL) AND (b IS NULL))
+ -> Seq Scan on hp3
+ Filter: ((a IS NULL) AND (b IS NULL))
+(9 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+ -> Seq Scan on hp1
+ Filter: ((b IS NULL) AND (a = 1))
+ -> Seq Scan on hp2
+ Filter: ((b IS NULL) AND (a = 1))
+ -> Seq Scan on hp3
+ Filter: ((b IS NULL) AND (a = 1))
+(9 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(9 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp1
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(9 rows)
+
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..317ff479aa 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,122 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range or hash partitions
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, hp, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v36-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v36-0003-Faster-partition-pruning.patchDownload
From 0e9903d90e4be2a3f76622d70129a62a3684f640 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v36 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 869 +++++++++++++++++
src/backend/nodes/copyfuncs.c | 68 ++
src/backend/nodes/nodeFuncs.c | 51 +
src/backend/optimizer/path/allpaths.c | 16 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1252 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 35 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 5 +
src/include/nodes/primnodes.h | 48 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 157 ++--
src/test/regress/sql/partition_prune.sql | 2 +-
17 files changed, 2487 insertions(+), 112 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 786c05df73..251355c62f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,19 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ int partkeyidx, int opstrategy,
+ Expr *expr, Datum *value);
+static Bitmapset *perform_pruning_combine_step(PartitionPruneContext *context,
+ Bitmapset *srcparts,
+ PartitionPruneStepCombine *cstep);
+static Bitmapset *get_partitions_for_null_keys(PartitionPruneContext *context,
+ Bitmapset *keyisnull);
+static Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1573,865 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_unpruned_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of the matching partition indexes, or NULL if none can
+ * match.
+ */
+Bitmapset *
+get_unpruned_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result = bms_add_range(NULL, 0, context->nparts - 1);
+ ListCell *lc;
+
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepNoop:
+ /* no-op */
+ break;
+
+ case T_PartitionPruneStepNullness:
+ {
+ PartitionPruneStepNullness *nstep =
+ (PartitionPruneStepNullness *) step;
+
+ if (!bms_is_empty(nstep->keyisnull))
+ {
+ Bitmapset *step_parts;
+
+ step_parts = get_partitions_for_null_keys(context,
+ nstep->keyisnull);
+ result = bms_int_members(result, step_parts);
+ }
+
+ if (!bms_is_empty(nstep->keyisnotnull))
+ {
+ Bitmapset *step_parts;
+
+ /*
+ * The following will select all partitions that contain
+ * non-null values.
+ */
+ step_parts = get_partitions_for_keys(context, 0,
+ NULL, 0);
+ result = bms_int_members(result, step_parts);
+ }
+
+ break;
+ }
+
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) step;
+ Bitmapset *step_parts;
+ ListCell *lc1;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+
+ keyno = nvalues = 0;
+ foreach(lc1, opstep->values)
+ {
+ Expr *expr = lfirst(lc1);
+ Datum datum;
+
+ if (keyno > nvalues)
+ break;
+
+ if (partkey_datum_from_expr(context, keyno,
+ opstep->opstrategy,
+ expr, &datum))
+ {
+ values[nvalues++] = datum;
+ }
+
+ keyno++;
+ }
+
+ step_parts = get_partitions_for_keys(context,
+ opstep->opstrategy,
+ values, nvalues);
+ result = bms_int_members(result, step_parts);
+ break;
+ }
+
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep =
+ (PartitionPruneStepCombine *) step;
+
+ result = perform_pruning_combine_step(context, result, cstep);
+ break;
+ }
+
+ default:
+ break;
+ }
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ int partkeyidx, int opstrategy,
+ Expr *expr, Datum *value)
+{
+ Oid exprTyp = exprType((Node *) expr);
+
+ if (context->partopcintype[partkeyidx] != exprTyp)
+ {
+ Oid new_supfuncid;
+ int16 procnum;
+
+
+ procnum = (context->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+ new_supfuncid = get_opfamily_proc(context->partopfamily[partkeyidx],
+ context->partopcintype[partkeyidx],
+ exprTyp, procnum);
+ fmgr_info(new_supfuncid, &context->partsupfunc[partkeyidx]);
+ }
+
+ /*
+ * Add more expression types here as needed to support the requirements
+ * of the higher-level code.
+ */
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+static Bitmapset *
+get_partitions_for_null_keys(PartitionPruneContext *context,
+ Bitmapset *keyisnull)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ /* No pruning possible. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ case PARTITION_STRATEGY_LIST:
+ /*
+ * NULLs may only exist in the NULL partition, or in the
+ * default, if there's no NULL partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ /* Only the default range partition accepts nulls. */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy");
+ }
+
+ /* Prune all partitions as no partition has nulls. */
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that will need to be scanned for the
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+static Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes,
+ default_index = boundinfo->default_index;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ Bitmapset *result = NULL;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ {
+ uint64 rowHash;
+ bool keyisnull[PARTITION_MAX_KEYS];
+ int greatest_modulus,
+ result_index;
+
+ /*
+ * In this case, can only do pruning if we know values for all
+ * the keys and they're all non-null.
+ */
+ Assert(nvalues == context->partnatts);
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ memset(keyisnull, false, nvalues * sizeof(bool));
+ rowHash = compute_hash_value(partnatts, partsupfunc, values,
+ keyisnull);
+ result_index = partindices[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ {
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(partnatts == 1);
+
+ /*
+ * If there are no datums to compare keys with, but there are
+ * partitions, just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned. This may not technically be true for some data types
+ * (e.g. integer types), however, we currently lack any sort of
+ * infrastructure to provide us with proofs that would allow us to
+ * do anything smarter here.
+ */
+ if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null
+ * values and return.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(partindices[off] >= 0);
+ return bms_make_singleton(partindices[off]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0)
+ {
+ /*
+ * We don't want the matched datum to be in the
+ * result.
+ */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are
+ * greater, which in turn means that all
+ * partition satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have
+ * partitions for. The only possible partition that
+ * could contain a match is the default partition.
+ * Return that, if it exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(default_index)
+ : NULL;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default
+ * partitions, meaning there isn't one to return.
+ * Return the default partition if one exists.
+ */
+ if (off < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(default_index)
+ : NULL;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid btree operator strategy");
+ }
+
+ /* Finally add the partition indexes. */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+ }
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ {
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ /*
+ * If there are no datums to compare keys with, but there are
+ * partitions, just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null
+ * values and return.
+ */
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+ if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be one partition. */
+ if (partindices[off+1] >= 0)
+ return bms_make_singleton(partindices[off+1]);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Matched a prefix of the partition bound at off.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off-1],
+ boundinfo->kind[off-1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+ minoff = off;
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off+1],
+ boundinfo->kind[off+1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+ maxoff = off+1;
+ }
+ }
+ else if (off >= 0)
+ {
+ if (partindices[off+1] >= 0)
+ minoff = maxoff = off + 1;
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+
+ if (partindices[minoff] < 0 &&
+ minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ case BTGreaterStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off < 0)
+ {
+ /*
+ * All partition bounds are greater than the key, so
+ * include all partitions in the result.
+ */
+ off = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Matched a prefix of the partition bound at off.
+ */
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off++;
+ break;
+ }
+ off = nextoff;
+ }
+ }
+ else
+ off++;
+ }
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ case BTLessStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0)
+ {
+ /*
+ * Matched prefix of the partition bound at off.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off--;
+ break;
+ }
+ off = nextoff;
+ }
+
+ off++;
+ }
+ else if (!is_equal || inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * All partition bounds are greater than the key, so
+ * select none of the partitions, except the default.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ return NULL;
+ }
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid btree operator strategy");
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * Since partition keys with nulls are mapped to the default
+ * range partition, we must include the default partition if
+ * some keys could be null.
+ */
+ if (nvalues < partnatts)
+ result = bms_add_member(result, default_index);
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ return bms_add_member(result, default_index);
+ }
+ }
+ }
+ break;
+
+ default:
+ result = NULL;
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ }
+
+ return result;
+}
+
+static Bitmapset *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ Bitmapset *srcparts,
+ PartitionPruneStepCombine *cstep)
+{
+ ListCell *lc;
+ Bitmapset *result = srcparts;
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_OR:
+ {
+ Bitmapset *orparts = NULL;
+
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argparts;
+
+ argparts = get_unpruned_partitions(context,
+ list_make1(step));
+ orparts = bms_add_members(orparts, argparts);
+ }
+
+ result = bms_int_members(result, orparts);
+ break;
+ }
+
+ case COMBINE_AND:
+ {
+ Bitmapset *andparts = NULL;
+
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argparts;
+
+ argparts = get_unpruned_partitions(context,
+ list_make1(step));
+ andparts = andparts == NULL
+ ? argparts
+ : bms_int_members(andparts, argparts);
+ }
+
+ result = bms_int_members(result, andparts);
+ break;
+ }
+
+ case COMBINE_NOT:
+ {
+ Bitmapset *notparts;
+ Datum *ne_datums;
+ int n_ne_datums = list_length(cstep->argvalues),
+ i;
+
+ ne_datums = (Datum *) palloc0(n_ne_datums * sizeof(Datum));
+ i = 0;
+ foreach(lc, cstep->argvalues)
+ {
+ Expr *expr = lfirst(lc);
+ Datum datum;
+
+ if (partkey_datum_from_expr(context, 0, BTEqualStrategyNumber,
+ expr, &datum))
+ ne_datums[i++] = datum;
+ }
+ notparts = get_partitions_excluded_by_ne_datums(context,
+ ne_datums,
+ n_ne_datums);
+ result = bms_del_members(result, notparts);
+ break;
+ }
+
+ default:
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /* No partitions can be excluded if none of the datums were found. */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f84da801c6..dd2974e0e3 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2132,6 +2132,62 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepNoop
+ */
+static PartitionPruneStepNoop *
+_copyPartitionPruneStepNoop(const PartitionPruneStepNoop *from)
+{
+ PartitionPruneStepNoop *newnode = makeNode(PartitionPruneStepNoop);
+
+ /* Nothing to copy. */
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(values);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepNullness
+ */
+static PartitionPruneStepNullness *
+_copyPartitionPruneStepNullness(const PartitionPruneStepNullness *from)
+{
+ PartitionPruneStepNullness *newnode = makeNode(PartitionPruneStepNullness);
+
+ COPY_BITMAPSET_FIELD(keyisnull);
+ COPY_BITMAPSET_FIELD(keyisnotnull);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(argsteps);
+ COPY_NODE_FIELD(argvalues);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5022,6 +5078,18 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepNoop:
+ retval = _copyPartitionPruneStepNoop(from);
+ break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepNullness:
+ retval = _copyPartitionPruneStepNullness(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..1b475a7395 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,27 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepNoop:
+ /* No sub-structure. */
+ return true;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->values, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep = (PartitionPruneStepCombine *) node;
+
+ if (walker((Node *) cstep->argsteps, context))
+ return true;
+ if (walker((Node *) cstep->argvalues, context))
+ return true;
+ }
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2953,36 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepNoop:
+ {
+ PartitionPruneStepNoop *noopstep = (PartitionPruneStepNoop *) node;
+ PartitionPruneStepNoop *newnode;
+
+ FLATCOPY(newnode, noopstep, PartitionPruneStepNoop);
+
+ return (Node *) newnode;
+ }
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->values, opstep->values, List *);
+
+ return (Node *) newnode;
+ }
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep = ( PartitionPruneStepCombine *) node;
+ PartitionPruneStepCombine *newnode;
+
+ FLATCOPY(newnode, cstep, PartitionPruneStepCombine);
+ MUTATE(newnode->argsteps, cstep->argsteps, List *);
+ MUTATE(newnode->argvalues, cstep->argvalues, List *);
+
+ return (Node *) newnode;
+ }
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1c792a00eb..542c4a2bca 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -862,6 +863,7 @@ static void
set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
+ Relids live_children = NULL;
int parentRTindex = rti;
bool has_live_children;
double parent_rows;
@@ -875,6 +877,9 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ live_children = prune_append_rel_partitions(root, rel);
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1128,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..475eccf765
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1252 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+static List *generate_partition_pruning_steps_internal(
+ PartitionPruneContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(
+ PartitionPruneContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ Bitmapset **keyisnull, Bitmapset **keyisnotnull,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static List *get_steps_using_prefix(PartClauseInfo *last, List *prefix);
+static List *get_steps_using_prefix_recurse(PartClauseInfo *last,
+ List *prefix,
+ ListCell *start_in_prefix,
+ List *step_values);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ List *pruning_steps;
+ bool constfalse;
+ int partnatts = rel->part_scheme->partnatts;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(&context,
+ clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes;
+
+ partindexes = get_unpruned_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(PartitionPruneContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ return generate_partition_pruning_steps_internal(context, clauses,
+ constfalse);
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * For each operator clause that's matched with a partition key, we generate
+ * a PartitionPruneStepOp containing relevant details of the operator and
+ * the expression whose value to use for comparison against partition bounds.
+ *
+ * If we encounter an OR clause, we generate a PartitionPruneStepCombine whose
+ * arguments are other partition pruning steps, each of which might be a
+ * PartitionPruneStepOp or another PartitionPruneStepCombine.
+ *
+ * If we find a RestrictInfo that's marked as pseudoconstant and contains a
+ * constant false value for clause, we stop generating any further steps and
+ * return NIL (no pruning steps) after setting *constfalse to true.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of important lists before passing them to this
+ * function.
+ */
+static List *
+generate_partition_pruning_steps_internal(PartitionPruneContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber],
+ *ne_clauses = NIL;
+ Bitmapset *keyisnull = NULL,
+ *keyisnotnull = NULL;
+ bool foundkeyclause = false;
+ bool need_next_key;
+ List *steps = NIL;
+ ListCell *lc;
+ int i;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ PartitionPruneStepCombine *combineStep;
+ List *all_arg_steps = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /* Get pruning step for each arg. */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+ if (argsteps != NIL)
+ {
+ if (list_length(argsteps) == 1)
+ all_arg_steps = lappend(all_arg_steps,
+ linitial(argsteps));
+ else
+ {
+ PartitionPruneStepCombine *argcomb;
+
+ /* Make a nested AND/OR combine step. */
+ Assert(IsA(arg, BoolExpr));
+ Assert(((BoolExpr *) arg)->boolop != NOT_EXPR);
+ argcomb = makeNode(PartitionPruneStepCombine);
+ if (((BoolExpr *) arg)->boolop == AND_EXPR)
+ argcomb->combineOp = COMBINE_AND;
+ else if (((BoolExpr *) arg)->boolop == OR_EXPR)
+ argcomb->combineOp = COMBINE_OR;
+ argcomb->argsteps = argsteps;
+ argcomb->argvalues = NIL;
+
+ all_arg_steps = lappend(all_arg_steps, argcomb);
+ }
+ }
+ else
+ {
+ List *partconstr = context->partition_qual;
+ PartitionPruneStepNoop *noop;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ context->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ noop = makeNode(PartitionPruneStepNoop);
+ all_arg_steps = lappend(all_arg_steps, noop);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+ if (*constfalse)
+ return NIL;
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_OR;
+ combineStep->argsteps = all_arg_steps;
+ combineStep->argvalues = NIL;
+ steps = lappend(steps, combineStep);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ bool is_neop_listp = false;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+ bool unsupported_clause = false;
+
+ switch (match_clause_to_partition_key(context, clause, partkey, i,
+ &keyisnull, &keyisnotnull,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ foundkeyclause = true;
+ Assert(pc != NULL);
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ /*
+ * match_clause_to_partition_key() already set the values
+ * in keyisnull or keyisnotnull for us.
+ */
+ foundkeyclause = true;
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ steps = list_concat(steps, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /* go check for the next key. */
+ break;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ unsupported_clause = true;
+ break;
+
+ default:
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /* There were nothing but combining steps in the clauses we got. */
+ if (!foundkeyclause)
+ return steps;
+
+ /*
+ * Generate PartitionPruneStepOp nodes from the clauses in keyclauses
+ * lists.
+ */
+
+ /*
+ * Group clauses according to the operator strategies, generating one list
+ * for each partitioning operator strategy.
+ */
+ need_next_key = true;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ bool need_cur_less = true,
+ need_cur_eq = true,
+ need_cur_greater = true;
+ List *clauselist = keyclauses[i];
+
+ if (!need_next_key || clauselist == NIL)
+ break;
+
+ /*
+ * Check whether we need this key's clauses. Basically, we don't if
+ * we didn't find a requisite clause for adjacently previous column.
+ */
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ int j;
+
+ for (j = 0; j < BTMaxStrategyNumber; j++)
+ {
+ if (btree_clauses[j] != NIL)
+ {
+ PartClauseInfo *last = llast(btree_clauses[j]);
+
+ switch (last->op_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (i > last->keyno + 1)
+ need_cur_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ if (i > last->keyno + 1)
+ need_cur_eq = false;
+ break;
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (i > last->keyno + 1)
+ need_cur_greater = false;
+ break;
+ }
+ }
+ }
+
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ if (hash_clauses[HTEqualStrategyNumber - 1] != NIL)
+ {
+ PartClauseInfo *last;
+
+ last = llast(hash_clauses[HTEqualStrategyNumber - 1]);
+ if (i > last->keyno + 1)
+ need_cur_eq = false;
+ }
+ break;
+ }
+
+ default:
+ break;
+ }
+
+ if (clauselist == NIL ||
+ (!need_cur_less && !need_cur_eq && !need_cur_greater))
+ break;
+
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ bool need_cur_clause = true,
+ inclusive = false;
+
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ case BTLessStrategyNumber:
+ need_cur_clause = need_cur_less;
+ if (!inclusive)
+ need_next_key = false;
+ break;
+ case BTEqualStrategyNumber:
+ need_cur_clause = need_cur_eq;
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ case BTGreaterStrategyNumber:
+ need_cur_clause = need_cur_greater;
+ if (!inclusive)
+ need_next_key = false;
+ break;
+ }
+
+ if (need_cur_clause)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ if (need_cur_eq)
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ break;
+ }
+ }
+ }
+
+ /*
+ * If we didn't find clauses for all partition columns in the hash
+ * partitioning case, give up on pruning.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH &&
+ i < context->partnatts)
+ return NIL;
+
+ /*
+ * Generate actual steps for various operator strategies by generating
+ * tuples of values, possibly multiple per operator strategy.
+ */
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each non-equality strategy, generate tuples of values such
+ * that each tuple's non-last values come from an equality clause.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ pc = lfirst(lc);
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ if (prefix == NIL && pc->keyno > 0)
+ continue;
+
+ /*
+ * Considering pc->value as the last value in the pruning
+ * tuple, try to generate pruning steps for tuples
+ * containing various combinations of values for earlier
+ * columns from the clauses in prefix.
+ */
+ pc_steps = get_steps_using_prefix(pc, prefix);
+ steps = list_concat(steps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+ List *pc_steps;
+
+ foreach(lc, eq_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *prefix = NIL;
+ ListCell *lc1;
+
+ /* Skip to the last column. */
+ if (pc->keyno < context->partnatts - 1)
+ continue;
+
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ prefix = lappend(prefix, eqpc);
+ }
+
+ pc_steps = get_steps_using_prefix(pc, prefix);
+ steps = list_concat(steps, list_copy(pc_steps));
+ }
+ break;
+ }
+
+ default:
+ break;
+ }
+
+ /* Combine values from all <> operator clauses into one prune step. */
+ if (ne_clauses != NIL)
+ {
+ List *argvalues = NIL;
+ PartitionPruneStepCombine *combineStep;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+
+ argvalues = lappend(argvalues, pc->value);
+ }
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_NOT;
+ combineStep->argsteps = NIL;
+ combineStep->argvalues = argvalues;
+ steps = lappend(steps, combineStep);
+ }
+
+ /*
+ * Generate one prune step for the information derived from IS NULL and
+ * IS NOT NULL clauses
+ */
+ if (!bms_is_empty(keyisnull) || !bms_is_empty(keyisnotnull))
+ {
+ PartitionPruneStepNullness *nstep;
+
+ nstep = makeNode(PartitionPruneStepNullness);
+ nstep->keyisnull = keyisnull;
+ nstep->keyisnotnull = keyisnotnull;
+ steps = lappend(steps, nstep);
+ }
+
+ return steps;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * A word on the outputs this produces. *pc will contain PartClauseInfo for
+ * this clause if it is successfully selected for pruning, if the clause is
+ * a simple operator clause or becomes one after we recognize the clause as
+ * a specially-shaped Boolean clause. For clauses that come in the form of
+ * a ScalarArrayOpExpr, we don't generate a PartClauseInfo, but rather
+ * recursively generate pruning steps for the values contained therein.
+ * *is_neop_listp is set if the clause contains a <> operator whose negator
+ * is a btree equality operator and list partitioning is in use.
+ *
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(PartitionPruneContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ Bitmapset **keyisnull, Bitmapset **keyisnotnull,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ Expr *value;
+ Oid partopfamily = context->partopfamily[partkeyidx],
+ partcoll = context->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &value))
+ {
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->value = value;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) && list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ value = rightop;
+ else if (equal(rightop, partkey))
+ {
+ value = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * the clause is useless no matter which key it's matched to.
+ */
+
+ /* Only allow strict operators. This will guarantee nulls are filtered. */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) value))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator is
+ * a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(partkeyidx, *keyisnull))
+ return PARTCLAUSE_MATCH_CONTRADICT;
+
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ opclause = copyObject(opclause);
+ opclause->opno = negator;
+ }
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /*
+ * If commuted before matching with the key, switch the
+ * clause's operator to the commutator.
+ */
+ if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+ (*pc)->value = value;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the
+ * clauses to the end of the list that's being processed
+ * currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps = generate_partition_pruning_steps_internal(context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps = generate_partition_pruning_steps_internal(context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(partkeyidx, *keyisnotnull))
+ return PARTCLAUSE_MATCH_CONTRADICT;
+
+ *keyisnull = bms_add_member(*keyisnull, partkeyidx);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(partkeyidx, *keyisnull))
+ return PARTCLAUSE_MATCH_CONTRADICT;
+
+ *keyisnotnull = bms_add_member(*keyisnotnull, partkeyidx);
+ }
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Recursively generate tuples and subsequently a PartitionPruneStepOp for
+ * each tuple.
+ *
+ * Example: Consider a partition key named (a, b, c) and a set of mutually
+ * AND'd clauses a <= 1 and a <= 2 and b <= 3 and b <= 4 and c = 2. If the
+ * caller passed c = 2 as 'last', 'prefix' should contain a <= 1, a <= 2,
+ * b <= 1 and b <= 2. Pruning steps containing = operator (from c = 2) that
+ * will be generated as a result will contain following tuples respectively:
+ * (1, 3, 2), (1, 4, 2), (2, 3, 2), and (2, 4, 2).
+ */
+static List *
+get_steps_using_prefix(PartClauseInfo *last, List *prefix)
+{
+ /* Quick exit if there are no values to prefix last's value with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStepOp *step = makeNode(PartitionPruneStepOp);
+
+ step->opstrategy = last->op_strategy;
+ step->values = list_make1(last->value);
+
+ return list_make1(step);
+ }
+
+ return get_steps_using_prefix_recurse(last, prefix, list_head(prefix),
+ NIL);
+}
+
+static List *
+get_steps_using_prefix_recurse(PartClauseInfo *last,
+ List *prefix,
+ ListCell *start_in_prefix,
+ List *step_values)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int step_keyno;
+
+ Assert(start_in_prefix != NULL);
+ step_keyno = ((PartClauseInfo *) lfirst(start_in_prefix))->keyno;
+ if (step_keyno == last->keyno - 1)
+ {
+ /*
+ * Recursion ends here. We generate pruning steps here by
+ * finalizing the step_values list.
+ */
+ Assert(list_length(step_values) == step_keyno);
+ for_each_cell(lc, start_in_prefix)
+ {
+ PartClauseInfo *prefix_pc = lfirst(lc);
+ PartitionPruneStepOp *step;
+ List *step_values1;
+
+ if (prefix_pc->keyno > step_keyno)
+ break;
+
+ step_values1 = list_copy(step_values);
+ step_values1 = lappend(step_values1, prefix_pc->value);
+ step_values1 = lappend(step_values1, last->value);
+ step = makeNode(PartitionPruneStepOp);
+ step->opstrategy = last->op_strategy;
+ step->values = step_values1;
+ result = lappend(result, step);
+ }
+ }
+ else
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start_in_prefix;
+
+ for_each_cell(lc, start_in_prefix)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > step_keyno)
+ break;
+ }
+ next_start_in_prefix = lc;
+
+ for_each_cell(lc, start_in_prefix)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == 0)
+ {
+ /* Start recursion for a new keyno == 0 value */
+ list_free(step_values);
+ step_values = list_make1(pc->value);
+ }
+ else if (pc->keyno == step_keyno)
+ step_values = lappend(step_values, pc->value);
+ else
+ break;
+
+ result = list_concat(result,
+ list_copy(get_steps_using_prefix_recurse(last,
+ prefix,
+ next_start_in_prefix,
+ step_values)));
+ }
+ }
+
+ return result;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 709a00924e..e272c445bf 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..49c0546e5f 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,38 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +105,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_unpruned_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..206bca3023 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -191,6 +191,11 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepNoop,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepNullness,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..5e3c1d3379 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,52 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionPruneStep - base type for nodes representing a partition pruning
+ * step
+ *----------
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+} PartitionPruneStep;
+
+/* a no-op step that doesn't prune any of the partitions. */
+typedef struct PartitionPruneStepNoop
+{
+ PartitionPruneStep step;
+} PartitionPruneStepNoop;
+
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *values;
+} PartitionPruneStepOp;
+
+typedef struct PartitionPruneStepNullness
+{
+ PartitionPruneStep step;
+
+ Bitmapset *keyisnull;
+ Bitmapset *keyisnotnull;
+} PartitionPruneStepNullness;
+
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_OR,
+ COMBINE_AND,
+ COMBINE_NOT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *argsteps;
+ List *argvalues;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 08a177dac4..b687924443 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -663,6 +665,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..d9ac2b49cb
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(PartitionPruneContext *context,
+ List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index d768dc0215..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1739,11 +1739,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1930,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 775bba6547..ef767e9f30 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -24,11 +24,13 @@ explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-(5 rows)
+(7 rows)
explain (costs off) select * from lp where a > 'a' and a <= 'd';
QUERY PLAN
@@ -208,16 +210,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +235,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +265,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +577,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +718,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +894,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +906,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -963,9 +967,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1007,24 +1013,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1036,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1229,13 +1220,7 @@ explain (costs off) select * from hp where a = 1 and b = 'xxx';
Append
-> Seq Scan on hp0
Filter: ((a = 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a = 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a = 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(9 rows)
+(3 rows)
explain (costs off) select * from hp where a is null and b = 'xxx';
QUERY PLAN
@@ -1255,29 +1240,17 @@ explain (costs off) select * from hp where a = 10 and b = 'xxx';
QUERY PLAN
--------------------------------------------------
Append
- -> Seq Scan on hp0
- Filter: ((a = 10) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a = 10) AND (b = 'xxx'::text))
-> Seq Scan on hp2
Filter: ((a = 10) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(9 rows)
+(3 rows)
explain (costs off) select * from hp where a = 10 and b = 'yyy';
QUERY PLAN
--------------------------------------------------
Append
- -> Seq Scan on hp0
- Filter: ((a = 10) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a = 10) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a = 10) AND (b = 'yyy'::text))
-> Seq Scan on hp3
Filter: ((a = 10) AND (b = 'yyy'::text))
-(9 rows)
+(3 rows)
explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
QUERY PLAN
@@ -1305,11 +1278,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1317,13 +1292,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
-- pruning should work fine, because prefix of keys is available
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
@@ -1331,11 +1314,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1345,7 +1330,7 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p_default t2_2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-(16 rows)
+(18 rows)
-- pruning should work fine in this case, too.
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
@@ -1357,13 +1342,15 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-> Seq Scan on mc3p1 t2
Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
Filter: (a = 1)
-(12 rows)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
--
-- pruning with clauses containing <> operator
@@ -1492,22 +1479,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning with just both columns constrained
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 317ff479aa..9e75f456bc 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
--
2.11.0
v36-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v36-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 0f36501626a4ec289ef522b4aaea7da2bf570283 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v36 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 ------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 100 ++++++++++++++++++++-------------
src/backend/optimizer/plan/planner.c | 94 +++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 ++--------------
src/backend/optimizer/util/relnode.c | 3 +
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++-------
src/include/optimizer/planner.h | 5 --
10 files changed, 107 insertions(+), 220 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index dd2974e0e3..a235ce1704 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2316,21 +2316,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5109,9 +5094,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index ee8d925db1..ebf1827a6b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3186,9 +3176,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 1785ea3918..9c8811f071 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4073,9 +4064,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 542c4a2bca..08570ce25d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -878,8 +878,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
live_children = prune_append_rel_partitions(root, rel);
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1320,6 +1332,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1330,7 +1348,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1357,49 +1374,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1418,9 +1441,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 24e6c46396..0d685863ab 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -559,7 +559,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -574,6 +573,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1116,12 +1116,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1193,10 +1193,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1427,6 +1429,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1527,6 +1533,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1534,7 +1555,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -5973,65 +5994,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 206bca3023..3bbfc0998f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -265,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b687924443..1d801b226f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -671,6 +675,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2123,27 +2128,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
--
2.11.0
Hi Amit,
On 03/13/2018 07:37 AM, Amit Langote wrote:
I will continue working on improving the comments / cleaning things up and
post a revised version soon, but until then please look at the attached.
Passes check-world.
Some minor comments:
0001: Ok
0002: Ok
0003:
* Trailing white space
* pruning.c
- partkey_datum_from_expr
* "Add more expression types..." -- Are you planning to add more of
these ? Otherwise change the comment
- get_partitions_for_null_keys
* Documentation for method
* 'break;' missing for _HASH and default case
- get_partitions_for_keys
* 'break;'s are outside of the 'case' blocks
* The 'switch(opstrategy)'s could use some {} blocks
* 'break;' missing from default
- perform_pruning_combine_step
* Documentation for method
* nodeFuncs.c
- Missing 'break;'s to follow style
0004: Ok
Best regards,
Jesper
Amit Langote wrote:
I will continue working on improving the comments / cleaning things up and
post a revised version soon, but until then please look at the attached.
I tried to give this a read. It looks pretty neat stuff -- as far as I
can tell, it follows Robert's sketch for how this should work. The fact
that it's under-commented makes me unable to follow it too closely
though (I felt like adding a few "wtf?" comments here and there), so
it's taking me a bit to follow things in detail. Please do submit
improved versions as you have them.
I think you're using an old version of pg_bsd_indent.
In particular need of commentary
* match_clause_to_partition_key() should indicate which params are
output and what do they get
* get_steps_using_prefix already has a comment, but it doesn't really
explain much. (I'm not sure why you use the term "tuple" here. I mean,
mathematically it probably makes sense, but in the overall context it
seems just confusing.)
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
By the way, I checked whether patch 0002 (additional tests) had an
effect on coverage, and couldn't detect any changes in terms of
lines/functions. Were you able to find any bugs in your code thanks to
the new tests that would not have been covered by existing tests?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 14 March 2018 at 06:54, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:
* "Add more expression types..." -- Are you planning to add more of
these ? Otherwise change the comment
Run-time pruning will deal with Param types here. The comment there
might be just to remind us all that the function must remain generic
enough so we can support more node types later. I don't particularly
need the comment for that patch, and I'm not quite sure if I should be
removing it in that patch. My imagination is not stretching far enough
today to think what we could use beyond Const and Param.
I don't feel strongly about the comment either way. This is just to
let you know that there are more up and coming things to do in that
spot.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/03/14 8:26, Alvaro Herrera wrote:
By the way, I checked whether patch 0002 (additional tests) had an
effect on coverage, and couldn't detect any changes in terms of
lines/functions. Were you able to find any bugs in your code thanks to
the new tests that would not have been covered by existing tests?
All tests except those for hash partitioning got added as bugs were found
in the patch and fixed. As you may know, constraint exclusion doesn't
help with pruning hash partitions, so those tests don' exercise any
existing functionality but are there for the *new* code.
Thanks,
Amit
On 14 March 2018 at 00:37, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached is a significantly revised version of the patch, although I admit
it could still use some work with regard to comments and other cleanup.
Thanks for making all those changes. There's been quite a bit of churn!
I've looked over the patch and agree that there need to be more
comments to explain things. There were certainly times during my
review that I just skipped ahead having not understood what I'd been
looking at.
Here are the notes from my review:
1. I think get_unpruned_partitions is not a great name for this
function. get_matching_partitions would be better.
2. HASH partitioning can no longer prune non-matching partitions in
cases like: SELECT * FROM hashp WHERE partkey1 IS NULL and partkey2 IS
NULL; This might mean you need to process IS NULL and the key clauses
in the same step.
I see you have a test which checks the plan for
explain (costs off) select * from hp where a = 1 and b is null;
which ensures all partitions are included. Why?
3. Header comment for get_partitions_for_keys references 'keys' which
is not a parameter to that function.
4. If you wrote a function to process an individual step and called
that in a foreach loop inside get_unpruned_partitions, then you
wouldn't need to list_make1() here. Instead just call the function to
perform one step.
argparts = get_unpruned_partitions(context,
list_make1(step));
5. I see you're performing AND step combination in two different ways.
In perform_pruning_combine_step you do;
andparts = andparts == NULL
? argparts
: bms_int_members(andparts, argparts);
but in get_unpruned_partitions, you add the entire range to the set
using bms_add_range, then intersect on that.
The code seems a bit fragile and relies on get_unpruned_partitions
returning an allocated Bitmapset all the time, even if bms_is_empty()
is true. There should be no distinction between NULL and a Bitmapset
that returns true on bms_is_empty().
What you've got here probably works for now, but only due to the fact
that get_unpruned_partitions allocate the entire range with
bms_add_range and that bms_int_members does not return NULL with no
matching members if both input sets are non-NULL. If that were to
change then your code would misbehave in cases like:
WHERE <matches no partition> AND <matches a partition>;
When processing the <matches no partition> clause, no partitions would
match, then if that resulted in an empty set you'd then surprisingly
match partitions again despite the AND clause not actually making it
possible for any partitions to have matched.
Probably you just need to bms_add_range in
perform_pruning_combine_step too and perform bms_int_members
unconditionally, just like you're doing in get_unpruned_partitions
6. The switch (last->op_strategy) in
generate_partition_pruning_steps_internal is missing a default ERROR
for unknown strategy
7. The switch: switch (context->strategy) in
generate_partition_pruning_steps_internal should ERROR rather than
break when it sees an unknown partition strategy.
8. Instead of copying the opclause here, wouldn't you be better just
this code come after you palloc the PartClause then just setup the
PartClause with the negator directly?
if (*is_neop_listp)
{
Assert(OidIsValid(negator));
opclause = copyObject(opclause);
opclause->opno = negator;
}
8. PartitionedChildRelInfo is still mentioned in typedefs.list
9. I don't quite understand PartitionPruneStepNoop. Why can't you just
skip adding anything to the list in
generate_partition_pruning_steps_internal?
10. The following test does not make sense:
explain (costs off) select * from hp where (a = 10 and b = 'yyy') or
(a = 10 and b = 'xxx') or (a is null and b is null);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Append
-> Seq Scan on hp0
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp1
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp2
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp3
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
(9 rows)
Why do 4 partitions match when there are only 3 sets of clauses when
each one can only match a single partition?
11. What does "root" mean here?
-- case for list partitioned table that's not root
explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b
<> 'cd' and b <> 'xy' and b is not null;
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/03/14 20:50, David Rowley wrote:
On 14 March 2018 at 00:37, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached is a significantly revised version of the patch, although I admit
it could still use some work with regard to comments and other cleanup.Thanks for making all those changes. There's been quite a bit of churn!
Thank you David and Jesper for the reviews. I'm halfway done addressing
the comments and will submit an updated patch by tomorrow.
Thanks,
Amit
Hi David.
Thanks for the review.
On 2018/03/14 20:50, David Rowley wrote:
On 14 March 2018 at 00:37, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached is a significantly revised version of the patch, although I admit
it could still use some work with regard to comments and other cleanup.Thanks for making all those changes. There's been quite a bit of churn!
I've looked over the patch and agree that there need to be more
comments to explain things. There were certainly times during my
review that I just skipped ahead having not understood what I'd been
looking at.
Hope the attached version is easier to understand.
Here are the notes from my review:
1. I think get_unpruned_partitions is not a great name for this
function. get_matching_partitions would be better.
OK, I changed it to get_matching_partitions.
2. HASH partitioning can no longer prune non-matching partitions in
cases like: SELECT * FROM hashp WHERE partkey1 IS NULL and partkey2 IS
NULL; This might mean you need to process IS NULL and the key clauses
in the same step.
OK, I made the PartitionPruneStepOp to contain a bitmap of null key
indexes. It is referred to along with "values" for other columns only in
the hash partitioning case, whereas for list and range partitioning,
actual value(s) and null key information are passed via separate
PartitionPruneStepOp nodes, because in case of the latter we cannot mix
nulls and "values".
So, now there is no need for PartitionPruneStepNullness and
get_partitions_for_null_keys().
I see you have a test which checks the plan for
explain (costs off) select * from hp where a = 1 and b is null;
which ensures all partitions are included. Why?
I had intended to remove support for hash partition pruning with some keys
being null that existed in the previous versions of the patch, but thought
it was too pessimistic after reading your comment. I've added it back and
it works like it used to in the previous versions.
3. Header comment for get_partitions_for_keys references 'keys' which
is not a parameter to that function.
Oops, fixed.
4. If you wrote a function to process an individual step and called
that in a foreach loop inside get_unpruned_partitions, then you
wouldn't need to list_make1() here. Instead just call the function to
perform one step.argparts = get_unpruned_partitions(context,
list_make1(step));
Hmm, yes. I've introduced a perform_pruning_step() that switches on the
pruning step node type, that is called by either get_matching_partitions
while iterating the list it receives or by perform_pruning_combine_step()
calls on the arguments of a PartitionPruneStepCombine.
5. I see you're performing AND step combination in two different ways.
In perform_pruning_combine_step you do;andparts = andparts == NULL
? argparts
: bms_int_members(andparts, argparts);but in get_unpruned_partitions, you add the entire range to the set
using bms_add_range, then intersect on that.The code seems a bit fragile and relies on get_unpruned_partitions
returning an allocated Bitmapset all the time, even if bms_is_empty()
is true. There should be no distinction between NULL and a Bitmapset
that returns true on bms_is_empty().What you've got here probably works for now, but only due to the fact
that get_unpruned_partitions allocate the entire range with
bms_add_range and that bms_int_members does not return NULL with no
matching members if both input sets are non-NULL. If that were to
change then your code would misbehave in cases like:WHERE <matches no partition> AND <matches a partition>;
When processing the <matches no partition> clause, no partitions would
match, then if that resulted in an empty set you'd then surprisingly
match partitions again despite the AND clause not actually making it
possible for any partitions to have matched.Probably you just need to bms_add_range in
perform_pruning_combine_step too and perform bms_int_members
unconditionally, just like you're doing in get_unpruned_partitions
I noticed a number of things I could improve about this, also considering
your points above. Please check if the new structure is an improvement.
6. The switch (last->op_strategy) in
generate_partition_pruning_steps_internal is missing a default ERROR
for unknown strategy
I've fixed that.
7. The switch: switch (context->strategy) in
generate_partition_pruning_steps_internal should ERROR rather than
break when it sees an unknown partition strategy.
This one too.
8. Instead of copying the opclause here, wouldn't you be better just
this code come after you palloc the PartClause then just setup the
PartClause with the negator directly?if (*is_neop_listp)
{
Assert(OidIsValid(negator));
opclause = copyObject(opclause);
opclause->opno = negator;
}
Agreed, done.
8. PartitionedChildRelInfo is still mentioned in typedefs.list
Removed.
9. I don't quite understand PartitionPruneStepNoop. Why can't you just
skip adding anything to the list in
generate_partition_pruning_steps_internal?10. The following test does not make sense:
explain (costs off) select * from hp where (a = 10 and b = 'yyy') or
(a = 10 and b = 'xxx') or (a is null and b is null);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Append
-> Seq Scan on hp0
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp1
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp2
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp3
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b
= 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
(9 rows)Why do 4 partitions match when there are only 3 sets of clauses when
each one can only match a single partition?
After bringing the support for hash partition pruning even with IS NULL
clauses back, this works as you'd expect.
11. What does "root" mean here?
-- case for list partitioned table that's not root
explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b
<> 'cd' and b <> 'xy' and b is not null;
It means a partitioned table that is not the root partitioned table.
Those <> clauses prune but not at the root level, only after recursing for
a list partitioned child of rlp.
Attached updated patches.
Thanks,
Amit
Attachments:
v37-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v37-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 4a4c2ee04a9a40072c5566f4d4f7c80c8feed4ea Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v37 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index bd3a0c4a0a..709a00924e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d576aa7350..08a177dac4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v37-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v37-0002-Add-more-tests-for-partition-pruning.patchDownload
From 6105e3d2b96fb827eb41e636646c728109dde0a4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v37 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 255 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 86 ++++++++-
2 files changed, 339 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..e2b90f3263 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,257 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..38b5f68658 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,88 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v37-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v37-0003-Faster-partition-pruning.patchDownload
From 24c8bbce87dfc43b9215c6a821cd3e56d40329b3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v37 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 911 ++++++++++++++++++
src/backend/nodes/copyfuncs.c | 52 +
src/backend/nodes/nodeFuncs.c | 54 ++
src/backend/optimizer/path/allpaths.c | 23 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1270 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 35 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/primnodes.h | 41 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 25 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 318 +++++--
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 6 +
18 files changed, 2758 insertions(+), 91 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 786c05df73..fde04604c5 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,22 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *perform_pruning_step(PartitionPruneContext *context,
+ PartitionPruneStep *step,
+ Bitmapset *srcparts);
+static Bitmapset *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static Bitmapset *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ Bitmapset *srcparts);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ int partkeyidx, Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ Bitmapset *nullkeys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1576,904 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions, or NULL if none
+ * survive.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ /* First there are no unpruned partitions. */
+ Bitmapset *result = bms_add_range(NULL, 0, context->nparts - 1);
+ ListCell *lc;
+
+ /*
+ * If there are multiple pruning steps, we perform them one after another,
+ * passing the result of one step as input to another. Based on the type
+ * of pruning step, perform_pruning_step may add or remove partitions from
+ * the set of partitions it receives as the input.
+ */
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ result = bms_int_members(result,
+ perform_pruning_step(context, step, result));
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_step
+ * Performs one PartitionPruneStep
+ */
+static Bitmapset *
+perform_pruning_step(PartitionPruneContext *context,
+ PartitionPruneStep *step,
+ Bitmapset *srcparts)
+{
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepNoop:
+ /* no-op */
+ break;
+
+ case T_PartitionPruneStepOp:
+ return perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+
+ case T_PartitionPruneStepCombine:
+ return perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ srcparts);
+
+ default:
+ elog(ERROR, "invalid partition pruning step: %d", nodeTag(step));
+ break;
+ }
+
+ return srcparts;
+}
+
+/*
+ * perform_pruning_base_step
+ * Returns indexes of partitions as given by get_partitions_for_keys
+ * for information contained in a given PartitionPruneStepOp
+ */
+static Bitmapset *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+
+ nvalues = 0;
+ lc = list_head(opstep->values);
+
+ /*
+ * Generate the partition look-up key.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ if (bms_is_member(keyno, opstep->nullkeys))
+ {
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ nvalues++;
+ continue;
+ }
+
+ if (keyno > nvalues)
+ break;
+
+ if (lc != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc);
+ if (partkey_datum_from_expr(context, keyno, expr, &datum))
+ values[nvalues++] = datum;
+ lc = lnext(lc);
+ }
+ }
+
+ return get_partitions_for_keys(context,
+ opstep->opstrategy,
+ values, nvalues,
+ opstep->nullkeys);
+}
+
+/*
+ * perform_pruning_combine_step
+ * Returns the set of partitions in 'srcparts' that remain after
+ * performing the pruning "combine" step specified in 'cstep'
+ */
+static Bitmapset *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ Bitmapset *srcparts)
+{
+ ListCell *lc;
+
+ srcparts = bms_copy(srcparts);
+ switch (cstep->combineOp)
+ {
+ case COMBINE_OR:
+ {
+ Bitmapset *step_parts = NULL;
+
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argparts;
+
+ /* Recursively get partitions by performing this step. */
+ argparts = perform_pruning_step(context, step, srcparts);
+ step_parts = bms_add_members(step_parts, argparts);
+ }
+
+ return step_parts;
+ }
+
+ case COMBINE_AND:
+ {
+ /* First there are no unpruned partitions. */
+ Bitmapset *step_parts = bms_add_range(NULL, 0,
+ context->nparts - 1);
+
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argparts;
+
+ argparts = perform_pruning_step(context, step, srcparts);
+ step_parts = bms_int_members(step_parts, argparts);
+ }
+
+ return step_parts;
+ }
+
+ case COMBINE_NOT:
+ {
+ Bitmapset *step_parts = NULL;
+ Datum *ne_datums;
+ int n_ne_datums = list_length(cstep->argvalues),
+ i;
+
+ /*
+ * XXX- The following ad-hoc method of pruning only works for list
+ * partitioning. It checks for each partition if all of its
+ * accepted values appear in ne_datums[].
+ */
+ ne_datums = (Datum *) palloc0(n_ne_datums * sizeof(Datum));
+ i = 0;
+ foreach(lc, cstep->argvalues)
+ {
+ Expr *expr = lfirst(lc);
+ Datum datum;
+
+ /*
+ * Note that we're passing 0 for partkeyidx, because there can
+ * be only one partition key column for list partitioning.
+ */
+ if (partkey_datum_from_expr(context, 0, expr, &datum))
+ ne_datums[i++] = datum;
+ }
+
+ step_parts = get_partitions_excluded_by_ne_datums(context,
+ ne_datums,
+ n_ne_datums);
+ return bms_del_members(srcparts, step_parts);
+ }
+
+ default:
+ /* Return the source partitions as is; should never happen. */
+ break;
+ }
+
+ return srcparts;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context, int partkeyidx,
+ Expr *expr, Datum *value)
+{
+ Oid exprTyp = exprType((Node *) expr);
+
+ if (context->partopcintype[partkeyidx] != exprTyp)
+ {
+ Oid new_supfuncid;
+ int16 procnum;
+
+
+ procnum = (context->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+ new_supfuncid = get_opfamily_proc(context->partopfamily[partkeyidx],
+ context->partopcintype[partkeyidx],
+ exprTyp, procnum);
+ fmgr_info(new_supfuncid, &context->partsupfunc[partkeyidx]);
+ }
+
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+static Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ Bitmapset *nullkeys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes,
+ default_index = boundinfo->default_index;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ Bitmapset *result = NULL;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ {
+ uint64 rowHash;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i,
+ greatest_modulus,
+ result_index;
+
+ memset(isnull, false, partnatts * sizeof(bool));
+ if (!bms_is_empty(nullkeys))
+ {
+ i = -1;
+ while ((i = bms_next_member(nullkeys, i)) >= 0)
+ {
+ Assert(i < partnatts);
+ isnull[i] = true;
+ }
+ }
+
+ /*
+ * In this case, can only do pruning if we know values for all
+ * the keys and they're all non-null.
+ */
+ if (nvalues == context->partnatts)
+ {
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values,
+ isnull);
+ result_index = partindices[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ break;
+ }
+
+ case PARTITION_STRATEGY_LIST:
+ {
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(partnatts == 1);
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default
+ * partition if the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are
+ * partitions, just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned. This may not technically be true for some data types
+ * (e.g. integer types), however, we currently lack any sort of
+ * infrastructure to provide us with proofs that would allow us to
+ * do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null
+ * values and return.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ {
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(partindices[off] >= 0);
+ return bms_make_singleton(partindices[off]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ break;
+ }
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ {
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0)
+ {
+ /*
+ * We don't want the matched datum to be in the
+ * result.
+ */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are
+ * greater, which in turn means that all
+ * partition satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have
+ * partitions for. The only possible partition that
+ * could contain a match is the default partition.
+ * Return that, if it exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(default_index)
+ : NULL;
+
+ minoff = off;
+ break;
+ }
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ {
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default
+ * partitions, meaning there isn't one to return.
+ * Return the default partition if one exists.
+ */
+ if (off < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(default_index)
+ : NULL;
+
+ maxoff = off;
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid btree operator strategy: %d",
+ opstrategy);
+ break;
+ }
+
+ /* Finally add the partition indexes. */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+ }
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ {
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ /* Nulls may exist in only the default range partition */
+ if (!bms_is_empty(nullkeys))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are
+ * partitions, just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null
+ * values and return.
+ */
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+ if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ {
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be one partition. */
+ if (partindices[off+1] >= 0)
+ return bms_make_singleton(partindices[off+1]);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Matched a prefix of the partition bound at off.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off-1],
+ boundinfo->kind[off-1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+ minoff = off;
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off+1],
+ boundinfo->kind[off+1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+ maxoff = off+1;
+ }
+ }
+ else if (off >= 0)
+ {
+ if (partindices[off+1] >= 0)
+ minoff = maxoff = off + 1;
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+
+ if (partindices[minoff] < 0 &&
+ minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+ break;
+ }
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ {
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off < 0)
+ {
+ /*
+ * All partition bounds are greater than the key, so
+ * include all partitions in the result.
+ */
+ off = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Matched a prefix of the partition bound at off.
+ */
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off++;
+ break;
+ }
+ off = nextoff;
+ }
+ }
+ else
+ off++;
+ }
+
+ minoff = off;
+ break;
+ }
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ {
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0)
+ {
+ /*
+ * Matched prefix of the partition bound at off.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off--;
+ break;
+ }
+ off = nextoff;
+ }
+
+ off++;
+ }
+ else if (!is_equal || inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * All partition bounds are greater than the key, so
+ * select none of the partitions, except the default.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ return NULL;
+ }
+
+ maxoff = off;
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid btree operator strategy: %d",
+ opstrategy);
+ break;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * Since partition keys with nulls are mapped to the default
+ * range partition, we must include the default partition if
+ * some keys could be null.
+ */
+ if (nvalues < partnatts)
+ result = bms_add_member(result, default_index);
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ return bms_add_member(result, default_index);
+ }
+ }
+ }
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /*
+ * No partitions can be excluded if none of the partitions accept the
+ * datums in ne_datums[].
+ */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f84da801c6..fb1abaceca 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2132,6 +2132,49 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepNoop
+ */
+static PartitionPruneStepNoop *
+_copyPartitionPruneStepNoop(const PartitionPruneStepNoop *from)
+{
+ PartitionPruneStepNoop *newnode = makeNode(PartitionPruneStepNoop);
+
+ /* Nothing to copy. */
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(values);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(argsteps);
+ COPY_NODE_FIELD(argvalues);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5022,6 +5065,15 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepNoop:
+ retval = _copyPartitionPruneStepNoop(from);
+ break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..9f50552e4e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,27 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepNoop:
+ /* No sub-structure. */
+ return true;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->values, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep = (PartitionPruneStepCombine *) node;
+
+ if (walker((Node *) cstep->argsteps, context))
+ return true;
+ if (walker((Node *) cstep->argvalues, context))
+ return true;
+ }
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2953,39 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepNoop:
+ {
+ PartitionPruneStepNoop *noopstep = (PartitionPruneStepNoop *) node;
+ PartitionPruneStepNoop *newnode;
+
+ FLATCOPY(newnode, noopstep, PartitionPruneStepNoop);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->values, opstep->values, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep = ( PartitionPruneStepCombine *) node;
+ PartitionPruneStepCombine *newnode;
+
+ FLATCOPY(newnode, cstep, PartitionPruneStepCombine);
+ MUTATE(newnode->argsteps, cstep->argsteps, List *);
+ MUTATE(newnode->argvalues, cstep->argvalues, List *);
+
+ return (Node *) newnode;
+ }
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8735e29807..ef64040798 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -869,12 +870,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ if (rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(root, rel);
+ did_pruning = true;
+ }
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1135,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) && did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..930e78f9d9
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1270 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+static List *generate_partition_pruning_steps_internal(
+ PartitionPruneContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(
+ PartitionPruneContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ Bitmapset **nullkeys, Bitmapset **notnullkeys,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static List *get_steps_using_prefix(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix,
+ ListCell *start,
+ List *step_values);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals or NULL if no partitions exist.
+ *
+ * Only call this if 'rel' corresponds to a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ int i;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ if (clauses == NIL)
+ {
+ /* If there are no clauses then include every partition. */
+ for (i = 0; i < rel->nparts; i++)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ else
+ {
+ PartitionPruneContext context;
+ List *pruning_steps;
+ bool constfalse;
+ int partnatts = rel->part_scheme->partnatts;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.relid = rel->relid;
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = partnatts;
+
+ context.partkeys = (Expr **) palloc(sizeof(Expr *) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ context.partkeys[i] = linitial(rel->partexprs[i]);
+
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+ context.has_default_part = rel->has_default_part;
+ context.partition_qual = rel->partition_qual;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(&context,
+ clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ Bitmapset *partindexes;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(PartitionPruneContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (context->has_default_part && context->partition_qual != NIL)
+ {
+ List *partqual = context->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, context->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ return generate_partition_pruning_steps_internal(context, clauses,
+ constfalse);
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * For each operator clause that's matched with a partition key, we generate
+ * a PartitionPruneStepOp containing relevant details of the operator and
+ * the expression whose value to use for comparison against partition bounds.
+ *
+ * If we encounter an OR clause, we generate a PartitionPruneStepCombine whose
+ * arguments are other partition pruning steps, each of which might be a
+ * PartitionPruneStepOp or another PartitionPruneStepCombine.
+ *
+ * If we find a RestrictInfo that's marked as pseudoconstant and contains a
+ * constant false value for clause, we stop generating any further steps and
+ * return NIL (no pruning steps) after setting *constfalse to true.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of important lists before passing them to this
+ * function.
+ */
+static List *
+generate_partition_pruning_steps_internal(PartitionPruneContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber],
+ *ne_clauses = NIL;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool foundkeyclause = false;
+ bool need_next_key;
+ List *steps = NIL;
+ ListCell *lc;
+ int i;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ PartitionPruneStepCombine *combineStep;
+ List *all_arg_steps = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /* Get pruning step for each arg. */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+ if (argsteps != NIL)
+ {
+ if (list_length(argsteps) == 1)
+ all_arg_steps = lappend(all_arg_steps,
+ linitial(argsteps));
+ else
+ {
+ PartitionPruneStepCombine *argcomb;
+
+ /* Make a nested AND/OR combine step. */
+ Assert(IsA(arg, BoolExpr));
+ Assert(((BoolExpr *) arg)->boolop != NOT_EXPR);
+ argcomb = makeNode(PartitionPruneStepCombine);
+ if (((BoolExpr *) arg)->boolop == AND_EXPR)
+ argcomb->combineOp = COMBINE_AND;
+ else if (((BoolExpr *) arg)->boolop == OR_EXPR)
+ argcomb->combineOp = COMBINE_OR;
+ argcomb->argsteps = argsteps;
+ argcomb->argvalues = NIL;
+
+ all_arg_steps = lappend(all_arg_steps, argcomb);
+ }
+ }
+ else
+ {
+ List *partconstr = context->partition_qual;
+ PartitionPruneStepNoop *noop;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (context->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ context->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ noop = makeNode(PartitionPruneStepNoop);
+ all_arg_steps = lappend(all_arg_steps, noop);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+ if (*constfalse)
+ return NIL;
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_OR;
+ combineStep->argsteps = all_arg_steps;
+ combineStep->argvalues = NIL;
+ steps = lappend(steps, combineStep);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ /*
+ * Queue its args to be processed later within the same
+ * invocation.
+ */
+ clauses = list_concat(clauses,
+ list_copy(((BoolExpr *) clause)->args));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ for (i = 0; i < context->partnatts; i++)
+ {
+ Expr *partkey = context->partkeys[i];
+ bool is_neop_listp = false;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+ bool unsupported_clause = false;
+
+ switch (match_clause_to_partition_key(context, clause, partkey, i,
+ &nullkeys, ¬nullkeys,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ foundkeyclause = true;
+ Assert(pc != NULL);
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ /*
+ * match_clause_to_partition_key() already set the values
+ * in nullkeys or notnullkeys for us.
+ */
+ foundkeyclause = true;
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ steps = list_concat(steps, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /* go check for the next key. */
+ break;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /* There were nothing but combining steps in the clauses we got. */
+ if (!foundkeyclause)
+ return steps;
+
+ /*
+ * Generate PartitionPruneStepOp nodes from the clauses in keyclauses
+ * lists.
+ */
+
+ /*
+ * Group clauses according to the operator strategies, generating one list
+ * for each partitioning operator strategy.
+ */
+ need_next_key = true;
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < context->partnatts; i++)
+ {
+ bool need_cur_less = true,
+ need_cur_eq = true,
+ need_cur_greater = true;
+ List *clauselist = keyclauses[i];
+
+ /*
+ * It's okay to not have opclauses for some columns in the hash case,
+ * but only if they've been explicitly requested to be null.
+ */
+ if (context->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NIL;
+
+ /*
+ * Check whether we need this key's clauses. Basically, we don't if
+ * we didn't find a requisite clause for adjacently previous column.
+ */
+ if (context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ int j;
+
+ if (!need_next_key || clauselist == NIL)
+ break;
+
+ for (j = 0; j < BTMaxStrategyNumber; j++)
+ {
+ if (btree_clauses[j] != NIL)
+ {
+ PartClauseInfo *last = llast(btree_clauses[j]);
+
+ switch (last->op_strategy)
+ {
+ case BTLessStrategyNumber:
+ case BTLessEqualStrategyNumber:
+ if (i > last->keyno + 1)
+ need_cur_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ if (i > last->keyno + 1)
+ need_cur_eq = false;
+ break;
+ case BTGreaterStrategyNumber:
+ case BTGreaterEqualStrategyNumber:
+ if (i > last->keyno + 1)
+ need_cur_greater = false;
+ break;
+
+ default:
+ elog(ERROR, "invalid btree strategy: %d",
+ last->op_strategy);
+ break;
+ }
+ }
+ }
+
+ if (!(need_cur_less || need_cur_eq || need_cur_greater))
+ break;
+ }
+
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ context->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ bool need_cur_clause = true,
+ inclusive = false;
+
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ case BTLessStrategyNumber:
+ need_cur_clause = need_cur_less;
+ if (!inclusive)
+ need_next_key = false;
+ break;
+ case BTEqualStrategyNumber:
+ need_cur_clause = need_cur_eq;
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ case BTGreaterStrategyNumber:
+ need_cur_clause = need_cur_greater;
+ if (!inclusive)
+ need_next_key = false;
+ break;
+ }
+
+ if (need_cur_clause)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ if (need_cur_eq)
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ context->strategy);
+ break;
+ }
+ }
+ }
+
+ /*
+ * Generate actual steps for various operator strategies by generating
+ * tuples of values, possibly multiple per operator strategy.
+ */
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each non-equality strategy, generate tuples of values such
+ * that each tuple's non-last values come from an equality clause.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ pc = lfirst(lc);
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ if (prefix == NIL && pc->keyno > 0)
+ continue;
+
+ /*
+ * Considering pc->value as the last value in the pruning
+ * tuple, try to generate pruning steps for tuples
+ * containing various combinations of values for earlier
+ * columns from the clauses in prefix.
+ */
+ pc_steps = get_steps_using_prefix(pc->op_strategy,
+ pc->value,
+ pc->keyno,
+ NULL,
+ prefix);
+ steps = list_concat(steps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ PartClauseInfo *pc;
+ ListCell *lc1;
+
+ if (eq_clauses != NIL)
+ {
+ pc = llast(eq_clauses);
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+
+ prefix = lappend(prefix, pc);
+ }
+
+ for_each_cell(lc1, lc)
+ {
+ pc_steps = get_steps_using_prefix(pc->op_strategy,
+ pc->value,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ steps = list_concat(steps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c", context->strategy);
+ break;
+ }
+
+ /* Combine values from all <> operator clauses into one prune step. */
+ if (ne_clauses != NIL)
+ {
+ List *argvalues = NIL;
+ PartitionPruneStepCombine *combineStep;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+
+ argvalues = lappend(argvalues, pc->value);
+ }
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_NOT;
+ combineStep->argsteps = NIL;
+ combineStep->argvalues = argvalues;
+ steps = lappend(steps, combineStep);
+ }
+
+ /*
+ * Generate one prune step for the information derived from IS NULL and
+ * IS NOT NULL clauses. Note that for IS NOT NULL clauses, simply having
+ * step suffices; there is no need to propagate the exact details of which
+ * keys are required to be NOT NULL.
+ */
+ if (!bms_is_empty(nullkeys) || !bms_is_empty(notnullkeys))
+ {
+ PartitionPruneStepOp *opstep;
+
+ opstep = makeNode(PartitionPruneStepOp);
+ opstep->nullkeys = nullkeys;
+ steps = lappend(steps, opstep);
+ }
+
+ return steps;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * A word on the outputs this produces. *pc will contain PartClauseInfo for
+ * this clause if it is successfully selected for pruning, if the clause is
+ * a simple operator clause or becomes one after we recognize the clause as
+ * a specially-shaped Boolean clause. For clauses that come in the form of
+ * a ScalarArrayOpExpr, we don't generate a PartClauseInfo, but rather
+ * recursively generate pruning steps for the values contained therein.
+ * *is_neop_listp is set if the clause contains a <> operator whose negator
+ * is a btree equality operator and list partitioning is in use.
+ *
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(PartitionPruneContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ Bitmapset **nullkeys, Bitmapset **notnullkeys,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ Expr *value;
+ Oid partopfamily = context->partopfamily[partkeyidx],
+ partcoll = context->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &value))
+ {
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->value = value;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) && list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ value = rightop;
+ else if (equal(rightop, partkey))
+ {
+ value = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * the clause is useless no matter which key it's matched to.
+ */
+
+ /* Only allow strict operators. This will guarantee nulls are filtered. */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) value))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator is
+ * a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULLs.
+ */
+ if (bms_is_member(partkeyidx, *nullkeys))
+ return PARTCLAUSE_MATCH_CONTRADICT;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->value = value;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (context->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the
+ * clauses to the end of the list that's being processed
+ * currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps = generate_partition_pruning_steps_internal(context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps = generate_partition_pruning_steps_internal(context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ {
+ /* check for conflicting IS NOT NULLs */
+ if (bms_is_member(partkeyidx, *notnullkeys))
+ return PARTCLAUSE_MATCH_CONTRADICT;
+
+ *nullkeys = bms_add_member(*nullkeys, partkeyidx);
+ }
+ else
+ {
+ /* check for conflicting IS NULLs */
+ if (bms_is_member(partkeyidx, *nullkeys))
+ return PARTCLAUSE_MATCH_CONTRADICT;
+
+ *notnullkeys = bms_add_member(*notnullkeys, partkeyidx);
+ }
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_steps_using_prefix
+ *
+ *
+ */
+static List *
+get_steps_using_prefix(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix lastvalue with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStepOp *step = makeNode(PartitionPruneStepOp);
+
+ step->opstrategy = step_opstrategy;
+ step->values = list_make1(step_lastvalue);
+ step->nullkeys = step_nullkeys;
+
+ return list_make1(step);
+ }
+
+ return get_steps_using_prefix_recurse(step_opstrategy,
+ step_lastvalue,
+ step_lastkeyno,
+ step_nullkeys,
+ prefix,
+ list_head(prefix),
+ NIL);
+}
+
+static List *
+get_steps_using_prefix_recurse(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix,
+ ListCell *start,
+ List *step_values)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int step_keyno;
+
+ Assert(start != NULL);
+ step_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (step_keyno == step_lastkeyno - 1)
+ {
+ Assert(list_length(step_values) == step_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStepOp *step;
+ List *step_values1;
+
+ if (pc->keyno > step_keyno)
+ break;
+
+ step_values1 = list_copy(step_values);
+ step_values1 = lappend(step_values1, pc->value);
+ step_values1 = lappend(step_values1, step_lastvalue);
+
+ step = makeNode(PartitionPruneStepOp);
+ step->opstrategy = step_opstrategy;
+ step->values = step_values1;
+ step->nullkeys = step_nullkeys;
+ result = lappend(result, step);
+ }
+ }
+ else
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > step_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == 0)
+ {
+ list_free(step_values);
+ step_values = list_make1(pc->value);
+ }
+ else if (pc->keyno == step_keyno)
+ step_values = lappend(step_values, pc->value);
+ else
+ break;
+
+ result =
+ list_concat(result,
+ list_copy(get_steps_using_prefix_recurse(step_opstrategy,
+ step_lastvalue,
+ step_lastkeyno,
+ step_nullkeys,
+ prefix,
+ next_start,
+ step_values)));
+ }
+ }
+
+ return result;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 709a00924e..e272c445bf 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..a18ea6e0c3 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,38 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Table's range table index */
+ int relid;
+
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Expr **partkeys;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Is one of the partitions the default partition */
+ bool has_default_part;
+
+ /* Partition qual if this's not the root partitioned table */
+ List *partition_qual;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +105,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..ea6d6dd5ae 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -191,6 +191,10 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepNoop,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..47ac3da77a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,45 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionPruneStep - base type for nodes representing a partition pruning
+ * step
+ *----------
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+} PartitionPruneStep;
+
+/* a no-op step that doesn't prune any of the partitions. */
+typedef struct PartitionPruneStepNoop
+{
+ PartitionPruneStep step;
+} PartitionPruneStepNoop;
+
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *values;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_OR,
+ COMBINE_AND,
+ COMBINE_NOT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *argsteps;
+ List *argvalues;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 08a177dac4..b687924443 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -663,6 +665,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..d9ac2b49cb
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(PartitionPruneContext *context,
+ List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index d768dc0215..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1739,11 +1739,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1930,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e2b90f3263..d75a23e4a6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -24,11 +24,13 @@ explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-(5 rows)
+(7 rows)
explain (costs off) select * from lp where a > 'a' and a <= 'd';
QUERY PLAN
@@ -208,16 +210,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +235,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +265,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +577,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +718,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +894,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +906,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -963,9 +967,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1007,24 +1013,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1036,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1098,11 +1089,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1110,13 +1103,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
-- pruning should work fine, because prefix of keys is available
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
@@ -1124,11 +1125,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1138,7 +1141,7 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p_default t2_2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-(16 rows)
+(18 rows)
-- pruning should work fine in this case, too.
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
@@ -1150,13 +1153,15 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-> Seq Scan on mc3p1 t2
Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
Filter: (a = 1)
-(12 rows)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
--
-- pruning with clauses containing <> operator
@@ -1271,22 +1276,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning with just both columns constrained
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1340,3 +1339,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 38b5f68658..86a3a3e7ce 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -237,3 +237,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d4765ce3b0..1488aebfe9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1575,6 +1575,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1587,6 +1588,11 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepNoop
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
--
2.11.0
v37-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v37-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 09b4356b9c0d724bb074797c643094d4e6e9bde3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v37 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 98 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 94 ++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 105 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index fb1abaceca..75096324f3 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2303,21 +2303,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5093,9 +5078,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index ee8d925db1..ebf1827a6b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3186,9 +3176,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index fd80891954..8088039d75 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4073,9 +4064,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index ef64040798..e628ff3dc9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -885,6 +885,16 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
live_children = prune_append_rel_partitions(root, rel);
did_pruning = true;
}
+
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
}
/*
@@ -1327,6 +1337,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1337,7 +1353,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1364,49 +1379,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1425,9 +1446,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9c4a1baf5f..20fca97e57 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -571,7 +571,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -586,6 +585,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1128,12 +1128,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1205,10 +1205,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1439,6 +1441,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1539,6 +1545,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1546,7 +1567,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6037,65 +6058,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index b586f941a8..f01119eff1 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -105,8 +105,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1464,9 +1463,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1475,28 +1471,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1571,8 +1546,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1594,8 +1568,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1603,14 +1577,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1637,8 +1603,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ea6d6dd5ae..959fed848b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,7 +264,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b687924443..1d801b226f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -671,6 +675,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2123,27 +2128,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1488aebfe9..a11555dd19 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1599,7 +1599,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PasswordType
Path
PathClauseUsage
--
2.11.0
On 17 March 2018 at 01:55, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Hope the attached version is easier to understand.
Hi Amit,
Thanks for making the updates. I'll look at them soon.
I've been thinking about how we're making these improvements for
SELECT only. If planning for an UPDATE or DELETE of a partitioned
table then since the inheritance planner is planning each partition
individually we gain next to nothing from this patch.
Generally, it seems the aim of this patch is to improve the usability
of partitions in an OLTP type workload, most likely OLAP does not
matter as much since planner overhead, in that case, is generally less
of a concern.
I experimented with the attached small patch to see if the situation
could be improved if we first plan the entire query with all
partitions then ignore dummy rels when planning for each individual
partition.
I used something along the lines of:
# create table listp (a int, b int) partition by list(a);
# select 'create table listp'||x||' partition of listp for values
in('||x||');' from generate_series(1, <number of tables>)x;
$ echo explain update listp set b = 1 where a = 1; > bench.sql
$ pgbench -f bench.sql -n -T 30 postgres
where <number of tables> started at 1 and went up in powers of 2 until 1024.
Unpatched = your v35 patch
Patched = your v35 + the attached.
The TPS result from a 30-second pgbench run of the above query showed:
Partitions = 1
Unpatched: 7323.3
Patched: 6573.2 (-10.24%)
Partitions = 2
Unpatched: 6784.8
Patched: 6377.1 (-6.01%)
Partitions = 4
Unpatched: 5903.0
Patched: 6106.8 (3.45%)
Partitions = 8
Unpatched: 4582.0
Patched: 5579.9 (21.78%)
Partitions = 16
Unpatched: 3131.5
Patched: 4521.2 (44.38%)
Partitions = 32
Unpatched: 1779.8
Patched: 3387.8 (90.35%)
Partitions = 64
Unpatched: 821.9
Patched: 2245.4 (173.18%)
Partitions = 128
Unpatched: 322.2
Patched: 1319.6 (309.56%)
Partitions = 256
Unpatched: 84.3
Patched: 731.7 (768.27%)
Partitions = 512
Unpatched: 22.5
Patched: 382.8 (1597.74%)
Partitions = 1024
Unpatched: 5.5
Patched: 150.1 (2607.83%)
Which puts the crossover point at just 4 partitions, and just a small
overhead for 1, 2 and probably 3 partitions. The planner generated a
plan 26 times faster (!) with 1024 partitions.
Likely there's more than could be squeezed out of this if we could get
the grouping_planner() to somehow skip creating paths and performing
the join search. But that patch is not nearly as simple as the
attached.
Probably grouping_planner could also be called with inheritance_update
= false, for this one case too, which might save a small amount of
effort.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
improve_performance_of_inheritance_planner.patchapplication/octet-stream; name=improve_performance_of_inheritance_planner.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9351e0cd3b3..080cfd1bdb1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1119,6 +1119,7 @@ inheritance_planner(PlannerInfo *root)
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
+ PlannerInfo *partition_root = NULL;
Assert(parse->commandType != CMD_INSERT);
@@ -1196,6 +1197,33 @@ inheritance_planner(PlannerInfo *root)
* of the ModifyTable node, if one is needed at all.
*/
partitioned_relids = bms_make_singleton(top_parentRTindex);
+
+
+ /*
+ * For partitioned tables, since we're able to determine the minimum
+ * set of partitions required much more easily than what we can do
+ * with an inheritance hierarchy, we invoke the grouping_planner on
+ * the entire given query in order to determine the minimum set of
+ * partitions which will be required below. This may mean that we
+ * invoke the grouping planner far fewer times, as otherwise we'd
+ * have to invoke it once for each partition.
+ */
+
+ /*
+ * Since the planner tends to scribble on the parse, we must make a
+ * copy of it. We also must make copies of the PlannerInfo and
+ * PlannerGlobal since these will also be modified from the call to
+ * grouping_planner.
+ */
+ partition_root = makeNode(PlannerInfo);
+ partition_root->glob = makeNode(PlannerGlobal);
+
+ memcpy(partition_root, root, sizeof(PlannerInfo));
+ memcpy(partition_root->glob, root->glob, sizeof(PlannerGlobal));
+
+ partition_root->parse = copyObject(partition_root->parse);
+
+ grouping_planner(partition_root, true, 0.0 /* retrieve all tuples */ );
}
/*
@@ -1226,6 +1254,21 @@ inheritance_planner(PlannerInfo *root)
if (!bms_is_member(appinfo->parent_relid, parent_relids))
continue;
+ /*
+ * If the target rel is a partitioned table then skip any child
+ * partitions which were found to be dummies by the grouping_planner
+ * call performed above.
+ */
+ if (partition_root)
+ {
+ RelOptInfo *rel;
+
+ rel = find_base_rel(partition_root, appinfo->child_relid);
+
+ if (IS_DUMMY_REL(rel))
+ continue;
+ }
+
/*
* expand_inherited_rtentry() always processes a parent before any of
* that parent's children, so the parent_root for this relation should
@@ -1486,6 +1529,14 @@ inheritance_planner(PlannerInfo *root)
/* Result path must go into outer query's FINAL upperrel */
final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
+ if (partition_root)
+ {
+ RelOptInfo *partrel;
+
+ partrel = find_base_rel(partition_root, nominalRelation);
+ final_rel->baserestrictinfo = partrel->baserestrictinfo;
+ }
+
/*
* We don't currently worry about setting final_rel's consider_parallel
* flag in this case, nor about allowing FDWs or create_upper_paths_hook
Hi David.
On 2018/03/19 16:18, David Rowley wrote:
On 17 March 2018 at 01:55, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Hope the attached version is easier to understand.
Hi Amit,
Thanks for making the updates. I'll look at them soon.
I've been thinking about how we're making these improvements for
SELECT only. If planning for an UPDATE or DELETE of a partitioned
table then since the inheritance planner is planning each partition
individually we gain next to nothing from this patch.
Unfortunately, yes. :-(
Just recently, I replied to a pgsql-bugs report by someone who had OOM
kill a backend running `delete from
partitioned_table_with_7202_partitions` on their test system [1]/messages/by-id/fecdef72-8c2a-0794-8e0a-2ad76db82c68@lab.ntt.co.jp. That'd
be because running inheritance_planner on a partitioned table doesn't cope
very well beyond a few hundred partitions, as we've also written in our
partitioning/inheritance documentation.
Generally, it seems the aim of this patch is to improve the usability
of partitions in an OLTP type workload, most likely OLAP does not
matter as much since planner overhead, in that case, is generally less
of a concern.
Yes, that makes sense.
I experimented with the attached small patch to see if the situation
could be improved if we first plan the entire query with all
partitions then ignore dummy rels when planning for each individual
partition.I used something along the lines of:
# create table listp (a int, b int) partition by list(a);
# select 'create table listp'||x||' partition of listp for values
in('||x||');' from generate_series(1, <number of tables>)x;
$ echo explain update listp set b = 1 where a = 1; > bench.sql
$ pgbench -f bench.sql -n -T 30 postgreswhere <number of tables> started at 1 and went up in powers of 2 until 1024.
Unpatched = your v35 patch
Patched = your v35 + the attached.The TPS result from a 30-second pgbench run of the above query showed:
Partitions = 1
Unpatched: 7323.3
Patched: 6573.2 (-10.24%)Partitions = 2
Unpatched: 6784.8
Patched: 6377.1 (-6.01%)Partitions = 4
Unpatched: 5903.0
Patched: 6106.8 (3.45%)Partitions = 8
Unpatched: 4582.0
Patched: 5579.9 (21.78%)Partitions = 16
Unpatched: 3131.5
Patched: 4521.2 (44.38%)Partitions = 32
Unpatched: 1779.8
Patched: 3387.8 (90.35%)Partitions = 64
Unpatched: 821.9
Patched: 2245.4 (173.18%)Partitions = 128
Unpatched: 322.2
Patched: 1319.6 (309.56%)Partitions = 256
Unpatched: 84.3
Patched: 731.7 (768.27%)Partitions = 512
Unpatched: 22.5
Patched: 382.8 (1597.74%)Partitions = 1024
Unpatched: 5.5
Patched: 150.1 (2607.83%)Which puts the crossover point at just 4 partitions, and just a small
overhead for 1, 2 and probably 3 partitions. The planner generated a
plan 26 times faster (!) with 1024 partitions.
Nice!
Likely there's more than could be squeezed out of this if we could get
the grouping_planner() to somehow skip creating paths and performing
the join search. But that patch is not nearly as simple as the
attached.
Yeah, that'd be nice. Do you think that we cannot fix update/delete on
partitioned tables until we have such a patch though? IOW, did you intend
the patch you posted to just be a PoC to demonstrate that we can save tons
just by not doing grouping_planner() on pruned partitions?
BTW, maybe you know, but if we want this to prune same partitions as are
pruned during select (due to the new pruning facility), we'd need to teach
get_relation_constraints() to not fetch the partition constraint
(RelationGetPartitionQual) at all. My patch currently teaches it to avoid
fetching the partition constraint only for select. If we include the
partition constraint in the list of constraints returned by
get_relation_constraints, we'd still be redundantly executing the
constraint exclusion logic for the selected partitions via the
grouping_planner() call on those partitions.
Thanks,
Amit
[1]: /messages/by-id/fecdef72-8c2a-0794-8e0a-2ad76db82c68@lab.ntt.co.jp
/messages/by-id/fecdef72-8c2a-0794-8e0a-2ad76db82c68@lab.ntt.co.jp
On 19 March 2018 at 23:03, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Just recently, I replied to a pgsql-bugs report by someone who had OOM
kill a backend running `delete from
partitioned_table_with_7202_partitions` on their test system [1]. That'd
be because running inheritance_planner on a partitioned table doesn't cope
very well beyond a few hundred partitions, as we've also written in our
partitioning/inheritance documentation.
hmm, yeah that's unfortunate. I'd not done much study of the
inheritance planner before, but I see how that could happen now that I
understand a bit more about it. nparts^2 RelOptInfos will be created
on such problems. My patch should help with that providing that some
pruning will actually take place, but make the problem very slightly
worse if none can be pruned.
On 2018/03/19 16:18, David Rowley wrote:
Likely there's more than could be squeezed out of this if we could get
the grouping_planner() to somehow skip creating paths and performing
the join search. But that patch is not nearly as simple as the
attached.Yeah, that'd be nice. Do you think that we cannot fix update/delete on
partitioned tables until we have such a patch though? IOW, did you intend
the patch you posted to just be a PoC to demonstrate that we can save tons
just by not doing grouping_planner() on pruned partitions?
The patch was meant as a PoC. I think the performance of the patch is
acceptable without any additional optimisation work. It would be nice,
but any more code that's added would need more reviewer and committer
time, both of which are finite, especially so before PG11 code cut
off.
I think it would be a shame to tell people partition is usable now for
a decent number of partitions, providing you don't need to perform any
OLTP UPDATE/DELETE operations on the partitions. I think for the few
lines of code that the proposed patch takes it's worth considering for
PG11, but only once your work has gone in. I certainly wouldn't want
this to hold your work back.
BTW, maybe you know, but if we want this to prune same partitions as are
pruned during select (due to the new pruning facility), we'd need to teach
get_relation_constraints() to not fetch the partition constraint
(RelationGetPartitionQual) at all. My patch currently teaches it to avoid
fetching the partition constraint only for select. If we include the
partition constraint in the list of constraints returned by
get_relation_constraints, we'd still be redundantly executing the
constraint exclusion logic for the selected partitions via the
grouping_planner() call on those partitions.
I'd not thought of that. It seems more like a performance optimisation
than something that's required for correctness. Removing that would
probably make constraint_exclusion = 'partition' pretty useless
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/03/16 21:55, Amit Langote wrote:
Attached updated patches.
Attached is further revised version.
Of note is getting rid of PartitionPruneContext usage in the static
functions of partprune.c. Most of the code there ought to only run during
planning, so it can access the necessary information from RelOptInfo
directly instead of copying it to PartitionPruneContext and then passing
it around.
Thanks,
Amit
Attachments:
v38-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v38-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From bc8446e160ea90b0a3e7da3f4e1d1e0a505913c4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v38 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index bd3a0c4a0a..709a00924e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d576aa7350..08a177dac4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v38-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v38-0002-Add-more-tests-for-partition-pruning.patchDownload
From 50d135ed049ebd2ec078dfed9b9913a55d1e68c5 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v38 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 255 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 86 ++++++++-
2 files changed, 339 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..e2b90f3263 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,257 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..38b5f68658 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,88 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v38-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v38-0003-Faster-partition-pruning.patchDownload
From 32618c12ee714b8c9fd518f195a673f47d06d75f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v38 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 911 ++++++++++++++++++
src/backend/nodes/copyfuncs.c | 52 +
src/backend/nodes/nodeFuncs.c | 54 ++
src/backend/optimizer/path/allpaths.c | 23 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1273 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/primnodes.h | 41 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 24 +
src/test/regress/expected/inherit.out | 10 +-
src/test/regress/expected/partition_prune.out | 318 ++++--
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 6 +
18 files changed, 2750 insertions(+), 91 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 786c05df73..fde04604c5 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,22 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *perform_pruning_step(PartitionPruneContext *context,
+ PartitionPruneStep *step,
+ Bitmapset *srcparts);
+static Bitmapset *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static Bitmapset *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ Bitmapset *srcparts);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ int partkeyidx, Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ Bitmapset *nullkeys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1560,9 +1576,904 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions, or NULL if none
+ * survive.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ /* First there are no unpruned partitions. */
+ Bitmapset *result = bms_add_range(NULL, 0, context->nparts - 1);
+ ListCell *lc;
+
+ /*
+ * If there are multiple pruning steps, we perform them one after another,
+ * passing the result of one step as input to another. Based on the type
+ * of pruning step, perform_pruning_step may add or remove partitions from
+ * the set of partitions it receives as the input.
+ */
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ result = bms_int_members(result,
+ perform_pruning_step(context, step, result));
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_step
+ * Performs one PartitionPruneStep
+ */
+static Bitmapset *
+perform_pruning_step(PartitionPruneContext *context,
+ PartitionPruneStep *step,
+ Bitmapset *srcparts)
+{
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepNoop:
+ /* no-op */
+ break;
+
+ case T_PartitionPruneStepOp:
+ return perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+
+ case T_PartitionPruneStepCombine:
+ return perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ srcparts);
+
+ default:
+ elog(ERROR, "invalid partition pruning step: %d", nodeTag(step));
+ break;
+ }
+
+ return srcparts;
+}
+
+/*
+ * perform_pruning_base_step
+ * Returns indexes of partitions as given by get_partitions_for_keys
+ * for information contained in a given PartitionPruneStepOp
+ */
+static Bitmapset *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+
+ nvalues = 0;
+ lc = list_head(opstep->values);
+
+ /*
+ * Generate the partition look-up key.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ if (bms_is_member(keyno, opstep->nullkeys))
+ {
+ if (context->strategy == PARTITION_STRATEGY_HASH)
+ nvalues++;
+ continue;
+ }
+
+ if (keyno > nvalues)
+ break;
+
+ if (lc != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc);
+ if (partkey_datum_from_expr(context, keyno, expr, &datum))
+ values[nvalues++] = datum;
+ lc = lnext(lc);
+ }
+ }
+
+ return get_partitions_for_keys(context,
+ opstep->opstrategy,
+ values, nvalues,
+ opstep->nullkeys);
+}
+
+/*
+ * perform_pruning_combine_step
+ * Returns the set of partitions in 'srcparts' that remain after
+ * performing the pruning "combine" step specified in 'cstep'
+ */
+static Bitmapset *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ Bitmapset *srcparts)
+{
+ ListCell *lc;
+
+ srcparts = bms_copy(srcparts);
+ switch (cstep->combineOp)
+ {
+ case COMBINE_OR:
+ {
+ Bitmapset *step_parts = NULL;
+
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argparts;
+
+ /* Recursively get partitions by performing this step. */
+ argparts = perform_pruning_step(context, step, srcparts);
+ step_parts = bms_add_members(step_parts, argparts);
+ }
+
+ return step_parts;
+ }
+
+ case COMBINE_AND:
+ {
+ /* First there are no unpruned partitions. */
+ Bitmapset *step_parts = bms_add_range(NULL, 0,
+ context->nparts - 1);
+
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argparts;
+
+ argparts = perform_pruning_step(context, step, srcparts);
+ step_parts = bms_int_members(step_parts, argparts);
+ }
+
+ return step_parts;
+ }
+
+ case COMBINE_NOT:
+ {
+ Bitmapset *step_parts = NULL;
+ Datum *ne_datums;
+ int n_ne_datums = list_length(cstep->argvalues),
+ i;
+
+ /*
+ * XXX- The following ad-hoc method of pruning only works for list
+ * partitioning. It checks for each partition if all of its
+ * accepted values appear in ne_datums[].
+ */
+ ne_datums = (Datum *) palloc0(n_ne_datums * sizeof(Datum));
+ i = 0;
+ foreach(lc, cstep->argvalues)
+ {
+ Expr *expr = lfirst(lc);
+ Datum datum;
+
+ /*
+ * Note that we're passing 0 for partkeyidx, because there can
+ * be only one partition key column for list partitioning.
+ */
+ if (partkey_datum_from_expr(context, 0, expr, &datum))
+ ne_datums[i++] = datum;
+ }
+
+ step_parts = get_partitions_excluded_by_ne_datums(context,
+ ne_datums,
+ n_ne_datums);
+ return bms_del_members(srcparts, step_parts);
+ }
+
+ default:
+ /* Return the source partitions as is; should never happen. */
+ break;
+ }
+
+ return srcparts;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context, int partkeyidx,
+ Expr *expr, Datum *value)
+{
+ Oid exprTyp = exprType((Node *) expr);
+
+ if (context->partopcintype[partkeyidx] != exprTyp)
+ {
+ Oid new_supfuncid;
+ int16 procnum;
+
+
+ procnum = (context->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+ new_supfuncid = get_opfamily_proc(context->partopfamily[partkeyidx],
+ context->partopcintype[partkeyidx],
+ exprTyp, procnum);
+ fmgr_info(new_supfuncid, &context->partsupfunc[partkeyidx]);
+ }
+
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_for_keys
+ * Returns the index of partitions that
+ * given look up keys
+ *
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
+ *
+ * Outputs:
+ * Bitmapset containing indexes of the selected partitions
+ */
+static Bitmapset *
+get_partitions_for_keys(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ Bitmapset *nullkeys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes,
+ default_index = boundinfo->default_index;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ Bitmapset *result = NULL;
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ {
+ uint64 rowHash;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i,
+ greatest_modulus,
+ result_index;
+
+ memset(isnull, false, partnatts * sizeof(bool));
+ if (!bms_is_empty(nullkeys))
+ {
+ i = -1;
+ while ((i = bms_next_member(nullkeys, i)) >= 0)
+ {
+ Assert(i < partnatts);
+ isnull[i] = true;
+ }
+ }
+
+ /*
+ * In this case, can only do pruning if we know values for all
+ * the keys and they're all non-null.
+ */
+ if (nvalues == context->partnatts)
+ {
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values,
+ isnull);
+ result_index = partindices[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ break;
+ }
+
+ case PARTITION_STRATEGY_LIST:
+ {
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(partnatts == 1);
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default
+ * partition if the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(boundinfo->default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are
+ * partitions, just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * For range queries, always include the default list partition,
+ * because list partitions divide the key space in a discontinuous
+ * manner, not all values in the given range will have a partition
+ * assigned. This may not technically be true for some data types
+ * (e.g. integer types), however, we currently lack any sort of
+ * infrastructure to provide us with proofs that would allow us to
+ * do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null
+ * values and return.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ {
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(partindices[off] >= 0);
+ return bms_make_singleton(partindices[off]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ break;
+ }
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ {
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0)
+ {
+ /*
+ * We don't want the matched datum to be in the
+ * result.
+ */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are
+ * greater, which in turn means that all
+ * partition satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have
+ * partitions for. The only possible partition that
+ * could contain a match is the default partition.
+ * Return that, if it exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(default_index)
+ : NULL;
+
+ minoff = off;
+ break;
+ }
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ {
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default
+ * partitions, meaning there isn't one to return.
+ * Return the default partition if one exists.
+ */
+ if (off < 0)
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(default_index)
+ : NULL;
+
+ maxoff = off;
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid btree operator strategy: %d",
+ opstrategy);
+ break;
+ }
+
+ /* Finally add the partition indexes. */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+ }
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ {
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ /* Nulls may exist in only the default range partition */
+ if (!bms_is_empty(nullkeys))
+ return partition_bound_has_default(boundinfo)
+ ? bms_make_singleton(boundinfo->default_index)
+ : NULL;
+
+ /*
+ * If there are no datums to compare keys with, but there are
+ * partitions, just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null
+ * values and return.
+ */
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+ if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ {
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be one partition. */
+ if (partindices[off+1] >= 0)
+ return bms_make_singleton(partindices[off+1]);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Matched a prefix of the partition bound at off.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off-1],
+ boundinfo->kind[off-1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+ minoff = off;
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off+1],
+ boundinfo->kind[off+1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+ maxoff = off+1;
+ }
+ }
+ else if (off >= 0)
+ {
+ if (partindices[off+1] >= 0)
+ minoff = maxoff = off + 1;
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+
+ if (partindices[minoff] < 0 &&
+ minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+ break;
+ }
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ {
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off < 0)
+ {
+ /*
+ * All partition bounds are greater than the key, so
+ * include all partitions in the result.
+ */
+ off = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Matched a prefix of the partition bound at off.
+ */
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off++;
+ break;
+ }
+ off = nextoff;
+ }
+ }
+ else
+ off++;
+ }
+
+ minoff = off;
+ break;
+ }
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ {
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0)
+ {
+ /*
+ * Matched prefix of the partition bound at off.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off--;
+ break;
+ }
+ off = nextoff;
+ }
+
+ off++;
+ }
+ else if (!is_equal || inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * All partition bounds are greater than the key, so
+ * select none of the partitions, except the default.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ return NULL;
+ }
+
+ maxoff = off;
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid btree operator strategy: %d",
+ opstrategy);
+ break;
+ }
+
+ Assert (minoff >= 0 && maxoff >= 0);
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * Since partition keys with nulls are mapped to the default
+ * range partition, we must include the default partition if
+ * some keys could be null.
+ */
+ if (nvalues < partnatts)
+ result = bms_add_member(result, default_index);
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ return bms_add_member(result, default_index);
+ }
+ }
+ }
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ context->strategy);
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc, partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /*
+ * No partitions can be excluded if none of the partitions accept the
+ * datums in ne_datums[].
+ */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 3ad4da64aa..e8174d597f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2132,6 +2132,49 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepNoop
+ */
+static PartitionPruneStepNoop *
+_copyPartitionPruneStepNoop(const PartitionPruneStepNoop *from)
+{
+ PartitionPruneStepNoop *newnode = makeNode(PartitionPruneStepNoop);
+
+ /* Nothing to copy. */
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(values);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(argsteps);
+ COPY_NODE_FIELD(argvalues);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5023,6 +5066,15 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepNoop:
+ retval = _copyPartitionPruneStepNoop(from);
+ break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..9f50552e4e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,27 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepNoop:
+ /* No sub-structure. */
+ return true;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->values, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep = (PartitionPruneStepCombine *) node;
+
+ if (walker((Node *) cstep->argsteps, context))
+ return true;
+ if (walker((Node *) cstep->argvalues, context))
+ return true;
+ }
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2953,39 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepNoop:
+ {
+ PartitionPruneStepNoop *noopstep = (PartitionPruneStepNoop *) node;
+ PartitionPruneStepNoop *newnode;
+
+ FLATCOPY(newnode, noopstep, PartitionPruneStepNoop);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->values, opstep->values, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep = ( PartitionPruneStepCombine *) node;
+ PartitionPruneStepCombine *newnode;
+
+ FLATCOPY(newnode, cstep, PartitionPruneStepCombine);
+ MUTATE(newnode->argsteps, cstep->argsteps, List *);
+ MUTATE(newnode->argvalues, cstep->argvalues, List *);
+
+ return (Node *) newnode;
+ }
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8735e29807..ef64040798 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -869,12 +870,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ if (rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(root, rel);
+ did_pruning = true;
+ }
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1123,6 +1135,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) && did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..aa18cc2ae4
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1273 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being compared to */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static List *get_steps_using_prefix(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix,
+ ListCell *start,
+ List *step_values);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals or NULL if no partitions exist.
+ *
+ * Only call this if 'rel' corresponds to a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent,
+ * then it's possible that the partitioning hierarchy allows the
+ * parent partition to only contain a narrower range of values than
+ * the sub-partitioned table does. In this case it is possible that
+ * we'd include partitions that could not possibly have any tuples
+ * matching 'clauses'. The possibility of such a partition
+ * arrangement is perhaps unlikely for non-default partitions, but
+ * it may be more likely in the case of default partitions, so we'll
+ * add the parent partition table's partition qual to the clause list
+ * in this case only. This may result in the default partition being
+ * eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ return generate_partition_pruning_steps_internal(rel, clauses,
+ constfalse);
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * For each operator clause that's matched with a partition key, we generate
+ * a PartitionPruneStepOp containing relevant details of the operator and
+ * the expression whose value to use for comparison against partition bounds.
+ *
+ * If we encounter an OR clause, we generate a PartitionPruneStepCombine whose
+ * arguments are other partition pruning steps, each of which might be a
+ * PartitionPruneStepOp or another PartitionPruneStepCombine.
+ *
+ * If we find a RestrictInfo that's marked as pseudoconstant and contains a
+ * constant false value for clause, we stop generating any further steps and
+ * return NIL (no pruning steps) after setting *constfalse to true.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of important lists before passing them to this
+ * function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber],
+ *ne_clauses = NIL;
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool foundkeyclause = false;
+ List *steps = NIL;
+ ListCell *lc;
+ int i;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way.*/
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ PartitionPruneStepCombine *combineStep;
+ List *all_arg_steps = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ /*
+ * If arg was a clause matching this partition key, we'd
+ * get back the corresponding pruning step.
+ */
+ if (argsteps != NIL)
+ {
+ Assert(list_length(argsteps) == 1);
+ all_arg_steps = lappend(all_arg_steps,
+ linitial(argsteps));
+ }
+ else
+ {
+ /*
+ * No steps means the arg wasn't a clause matching
+ * this partition key. We cannot prune using such
+ * an arg. To indicate that to the pruning code,
+ * we must construct a PartitionPruneStepNoop and
+ * append it as an argument of the OR pruning combine
+ * step. However, if we can prove using constraint
+ * exclusion that the clause refutes the table's
+ * partition constraint (if it's sub-partitioned),
+ * we need not bother with that.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStepNoop *noop;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ noop = makeNode(PartitionPruneStepNoop);
+ all_arg_steps = lappend(all_arg_steps, noop);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+ if (*constfalse)
+ return NIL;
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_OR;
+ combineStep->argsteps = all_arg_steps;
+ combineStep->argvalues = NIL;
+ steps = lappend(steps, combineStep);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ PartitionPruneStepCombine *combineStep;
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps;
+
+ argsteps = generate_partition_pruning_steps_internal(rel,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_AND;
+ combineStep->argsteps = argsteps;
+ combineStep->argvalues = NIL;
+ steps = lappend(steps, combineStep);
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ foundkeyclause = true;
+ Assert(pc != NULL);
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+
+ foundkeyclause = true;
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ steps = list_concat(steps, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* Nothing to do here. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /* go check for the next key. */
+ break;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /* There were nothing but combining steps in the clauses we got. */
+ if (!foundkeyclause)
+ return steps;
+
+ /*
+ * Now we have one list of clauses per partition key. To be useful for
+ * pruning, we must have clauses for a prefix of partition keys in the
+ * case of range partitioning. For hash partitioning, if a column doesn't
+ * have necessary equality clause, there should be an IS NULL clause,
+ * otherwise pruning is not possible.
+ */
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ need_next_less = true,
+ need_next_eq = true,
+ need_next_greater = true;
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NIL;
+
+ if (clauselist == NIL &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ break;
+
+ if (!(need_next_less || need_next_eq || need_next_greater))
+ break;
+
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used for
+ * pruning if this is the first such key for this operator
+ * strategy or if it is consecutively next to the last
+ * column for which a clause with this operator strategy
+ * was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of certain
+ * strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+ }
+
+ /*
+ * Generate actual steps for various operator strategies by generating
+ * tuples of values, possibly multiple per operator strategy.
+ *
+ * XXX - add more description
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each non-equality strategy, generate tuples of values such
+ * that each tuple's non-last values come from an equality clause.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ pc = lfirst(lc);
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ if (prefix == NIL && pc->keyno > 0)
+ continue;
+
+ /*
+ * Considering pc->value as the last value in the pruning
+ * tuple, try to generate pruning steps for tuples
+ * containing various combinations of values for earlier
+ * columns from the clauses in prefix.
+ */
+ pc_steps = get_steps_using_prefix(pc->op_strategy,
+ pc->value,
+ pc->keyno,
+ NULL,
+ prefix);
+ steps = list_concat(steps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ PartClauseInfo *pc;
+ ListCell *lc1;
+
+ if (eq_clauses != NIL)
+ {
+ pc = llast(eq_clauses);
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+
+ prefix = lappend(prefix, pc);
+ }
+
+ for_each_cell(lc1, lc)
+ {
+ pc_steps = get_steps_using_prefix(pc->op_strategy,
+ pc->value,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ steps = list_concat(steps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Combine values from all <> operator clauses into one prune step. */
+ if (ne_clauses != NIL)
+ {
+ List *argvalues = NIL;
+ PartitionPruneStepCombine *combineStep;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+
+ argvalues = lappend(argvalues, pc->value);
+ }
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_NOT;
+ combineStep->argsteps = NIL;
+ combineStep->argvalues = argvalues;
+ steps = lappend(steps, combineStep);
+ }
+
+ /*
+ * Generate one prune step for the information derived from IS NULL and
+ * IS NOT NULL clauses. Note that for IS NOT NULL clauses, simply having
+ * step suffices; there is no need to propagate the exact details of which
+ * keys are required to be NOT NULL.
+ */
+ if (!bms_is_empty(nullkeys) || !bms_is_empty(notnullkeys))
+ {
+ PartitionPruneStepOp *opstep;
+
+ opstep = makeNode(PartitionPruneStepOp);
+ opstep->nullkeys = nullkeys;
+ steps = lappend(steps, opstep);
+ }
+
+ return steps;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * one of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match some other key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all
+ * even if it may have been matched with a key due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *value;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &value))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->value = value;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) && list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ value = rightop;
+ else if (equal(rightop, partkey))
+ {
+ value = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified
+ * for it, so try to match it too. There may be multiple keys
+ * with the same expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of
+ * the clause to see if it's sane to use it for pruning. If
+ * any of the properties makes it unsuitable for pruning, then
+ * the clause is useless no matter which key it's matched to.
+ */
+
+ /* Only allow strict operators. This will guarantee nulls are filtered. */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) value))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator is
+ * a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->value = value;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the
+ * clause to see if it can sanely be used for partition
+ * pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if
+ * list partitioning is in use and we're able to confirm that
+ * it's negator is a btree equality operator belonging to the
+ * partitioning operator family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the
+ * actual scalar values out into a flat list, so we give
+ * up doing anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element,
+ * of the form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the
+ * clauses to the end of the list that's being processed
+ * currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps = generate_partition_pruning_steps_internal(rel,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps = generate_partition_pruning_steps_internal(rel,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_steps_using_prefix
+ *
+ * XXX - add comment
+ */
+static List *
+get_steps_using_prefix(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix lastvalue with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStepOp *step = makeNode(PartitionPruneStepOp);
+
+ step->opstrategy = step_opstrategy;
+ step->values = list_make1(step_lastvalue);
+ step->nullkeys = step_nullkeys;
+
+ return list_make1(step);
+ }
+
+ return get_steps_using_prefix_recurse(step_opstrategy,
+ step_lastvalue,
+ step_lastkeyno,
+ step_nullkeys,
+ prefix,
+ list_head(prefix),
+ NIL);
+}
+
+static List *
+get_steps_using_prefix_recurse(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix,
+ ListCell *start,
+ List *step_values)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int step_keyno;
+
+ Assert(start != NULL);
+ step_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (step_keyno == step_lastkeyno - 1)
+ {
+ Assert(list_length(step_values) == step_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStepOp *step;
+ List *step_values1;
+
+ if (pc->keyno > step_keyno)
+ break;
+
+ step_values1 = list_copy(step_values);
+ step_values1 = lappend(step_values1, pc->value);
+ step_values1 = lappend(step_values1, step_lastvalue);
+
+ step = makeNode(PartitionPruneStepOp);
+ step->opstrategy = step_opstrategy;
+ step->values = step_values1;
+ step->nullkeys = step_nullkeys;
+ result = lappend(result, step);
+ }
+ }
+ else
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > step_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == 0)
+ {
+ list_free(step_values);
+ step_values = list_make1(pc->value);
+ }
+ else if (pc->keyno == step_keyno)
+ step_values = lappend(step_values, pc->value);
+ else
+ break;
+
+ result =
+ list_concat(result,
+ list_copy(get_steps_using_prefix_recurse(step_opstrategy,
+ step_lastvalue,
+ step_lastkeyno,
+ step_nullkeys,
+ prefix,
+ next_start,
+ step_values)));
+ }
+ }
+
+ return result;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 709a00924e..e272c445bf 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..a55b8a84dd 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +95,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..ea6d6dd5ae 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -191,6 +191,10 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepNoop,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..47ac3da77a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,45 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionPruneStep - base type for nodes representing a partition pruning
+ * step
+ *----------
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+} PartitionPruneStep;
+
+/* a no-op step that doesn't prune any of the partitions. */
+typedef struct PartitionPruneStepNoop
+{
+ PartitionPruneStep step;
+} PartitionPruneStepNoop;
+
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *values;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_OR,
+ COMBINE_AND,
+ COMBINE_NOT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *argsteps;
+ List *argvalues;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 08a177dac4..b687924443 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -663,6 +665,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..fbc26ec8ab
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,24 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(PlannerInfo *root,
+ RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index d768dc0215..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1739,11 +1739,7 @@ explain (costs off) select * from list_parted where a = 'ab' or a in (null, 'cd'
Append
-> Seq Scan on part_ab_cd
Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_ef_gh
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
- -> Seq Scan on part_null_xy
- Filter: (((a)::text = 'ab'::text) OR ((a)::text = ANY ('{NULL,cd}'::text[])))
-(7 rows)
+(3 rows)
explain (costs off) select * from list_parted where a = 'ab';
QUERY PLAN
@@ -1930,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e2b90f3263..d75a23e4a6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -24,11 +24,13 @@ explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-(5 rows)
+(7 rows)
explain (costs off) select * from lp where a > 'a' and a <= 'd';
QUERY PLAN
@@ -208,16 +210,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +235,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +265,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +577,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +718,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +894,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +906,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -963,9 +967,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1007,24 +1013,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1036,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1098,11 +1089,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1110,13 +1103,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
-- pruning should work fine, because prefix of keys is available
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
@@ -1124,11 +1125,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1138,7 +1141,7 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p_default t2_2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-(16 rows)
+(18 rows)
-- pruning should work fine in this case, too.
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
@@ -1150,13 +1153,15 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-> Seq Scan on mc3p1 t2
Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
Filter: (a = 1)
-(12 rows)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
--
-- pruning with clauses containing <> operator
@@ -1271,22 +1276,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning with just both columns constrained
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1340,3 +1339,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 38b5f68658..86a3a3e7ce 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -237,3 +237,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d4765ce3b0..1488aebfe9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1575,6 +1575,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1587,6 +1588,11 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepNoop
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
--
2.11.0
v38-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v38-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 01b12edd764b664b7f233e4a3ec51bc89a5b3a3e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v38 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 98 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 94 ++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 105 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e8174d597f..2f1d5bd2a1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2303,21 +2303,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5094,9 +5079,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 765b1be74b..164eff7363 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3187,9 +3177,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index fd80891954..8088039d75 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2229,7 +2229,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2254,6 +2253,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2303,6 +2303,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2528,16 +2529,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4073,9 +4064,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index ef64040798..e628ff3dc9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -885,6 +885,16 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
live_children = prune_append_rel_partitions(root, rel);
did_pruning = true;
}
+
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down
+ * in the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ rel->partitioned_child_rels = list_make1_int(rti);
}
/*
@@ -1327,6 +1337,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1337,7 +1353,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1364,49 +1379,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop below
+ * will look for such children and collect them in a list to be passed to
+ * the path creation function. (This assumes that we don't need to look
+ * through multiple levels of subquery RTEs; if we ever do, we could
+ * consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1425,9 +1446,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9c4a1baf5f..20fca97e57 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -571,7 +571,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -586,6 +585,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1128,12 +1128,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1205,10 +1205,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels
+ * of the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1439,6 +1441,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1539,6 +1545,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1546,7 +1567,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6037,65 +6058,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index f087369f75..c86350fd1e 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1498,9 +1497,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1509,28 +1505,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1605,8 +1580,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1628,8 +1602,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1637,14 +1611,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1671,8 +1637,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ea6d6dd5ae..959fed848b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,7 +264,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b687924443..1d801b226f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -671,6 +675,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2123,27 +2128,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1488aebfe9..a11555dd19 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1599,7 +1599,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PasswordType
Path
PathClauseUsage
--
2.11.0
On 21 March 2018 at 00:07, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached is further revised version.
In the 0004 patch I see:
@@ -1439,6 +1441,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
This seems to execute regardless of if the target relation is a
partitioned table or an inheritance parent. I think there needs to be
a condition so you only do this when planning for partitioned tables.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 20, 2018 at 7:07 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/03/16 21:55, Amit Langote wrote:
Attached updated patches.
Attached is further revised version.
Of note is getting rid of PartitionPruneContext usage in the static
functions of partprune.c. Most of the code there ought to only run during
planning, so it can access the necessary information from RelOptInfo
directly instead of copying it to PartitionPruneContext and then passing
it around.
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ if (rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(root, rel);
+ did_pruning = true;
+ }
+ }
Use &&
+ case COMBINE_OR:
+ {
Won't survive pgindent, which currently produces a *massive* diff for
these patches.
+ /*
+ * XXX- The following ad-hoc method of pruning only works for list
+ * partitioning. It checks for each partition if all of its
+ * accepted values appear in ne_datums[].
+ */
So why are we doing it this way? How about doing something not
ad-hoc? I tried to propose that before.
+ * Set *value to the constant value obtained by evaluating 'expr'
+ *
+ * Note that we may not be able to evaluate the input expression, in which
+ * case, the function returns false to indicate that *value has not been
+ * set. True is returned otherwise.
These comments need updating, since this function (laudibly) no longer
does any evaluating. I wonder how this will work for run-time
pruning, though.
+ if (context->partopcintype[partkeyidx] != exprTyp)
+ {
+ Oid new_supfuncid;
+ int16 procnum;
+
+
+ procnum = (context->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+ new_supfuncid = get_opfamily_proc(context->partopfamily[partkeyidx],
+ context->partopcintype[partkeyidx],
+ exprTyp, procnum);
+ fmgr_info(new_supfuncid, &context->partsupfunc[partkeyidx]);
+ }
What's the point of this, exactly? Leftover dead code, maybe?
+ * Input:
+ * See the comments above the definition of PartScanKeyInfo to see what
+ * kind of information is contained in 'keys'.
There's no such thing as PartScanKeyInfo any more and the function has
no argument called 'keys'. None of the functions actual arguments are
explained.
+ /*
+ * If there are multiple pruning steps, we perform them one after another,
+ * passing the result of one step as input to another. Based on the type
+ * of pruning step, perform_pruning_step may add or remove partitions from
+ * the set of partitions it receives as the input.
+ */
The comment sounds great, but the code doesn't work that way; it
always calls bms_int_members to intersect the new result with any
previous result. I'm baffled as to how this manages to DTRT if
COMBINE_OR is used. In general I had hoped that the list of pruning
steps was something over which we were only going to iterate, not
recurse. This definitely recurses for the combine steps, but it's
still (sorta) got the idea of a list of iterable steps. That's a
weird mix.
+ if (nvalues == context->partnatts)
+ {
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values,
+ isnull);
+ result_index = partindices[rowHash % greatest_modulus];
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ /* Can't do pruning otherwise, so return all partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
Wouldn't we want to (1) arrange things so that this function is never
called if nvalues < context->partnatts && context->strategy ==
PARTITION_STRATEGY_HASH or at least (2) avoid constructing isnull from
nullkeys if we're not going to use it?
Also, shouldn't we be sanity-checking the strategy number here?
I'm out of time for right now but it looks to me like this patch still
needs quite a bit of fine-tuning.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018/03/20 21:41, David Rowley wrote:
On 21 March 2018 at 00:07, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached is further revised version.
In the 0004 patch I see:
@@ -1439,6 +1441,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;+ /* Add the current parent's RT index to the partitioned rels set. */ + partitioned_relids = bms_add_member(partitioned_relids, + appinfo->parent_relid);This seems to execute regardless of if the target relation is a
partitioned table or an inheritance parent. I think there needs to be
a condition so you only do this when planning for partitioned tables.
Oops, that's quite wrong. Will fix, thanks.
Regards,
Amit
On 21 March 2018 at 00:07, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached is further revised version.
Hi Amit,
Thanks for sending the v38 patch.
I started reviewing this, but I just ended up starting to hack at the
patch instead. It's still got quite a bit of work to be done as I
think, unfortunately, the cross type stuff is still pretty broken.
There's not really any sort of checking done to make sure you've found
a valid cross type hash or compare function in the code which results
in errors like:
create table hashp (a int, b numeric) partition by hash(a,b);
create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
create table hashp3 partition of hashp for values with (modulus 4, remainder 2);
create table hashp4 partition of hashp for values with (modulus 4, remainder 3);
explain select * from hashp where a = 3::smallint and b = 1.0;
ERROR: cache lookup failed for function 0
I'm not really sure if this should be a matter of doing an if
(!OidIsValid(new_supfuncid)) return false; I think the
context->partsupfunc must be pretty broken in cases like:
create table listp (a bigint) partition by list(a);
create table listp1 partition of listp for values in(1);
select * from listp where a <> 1::smallint and a <> 1::bigint;
The current patch simply just remembers the last comparison function
for comparing int8 to int4 and uses that one for the int8 to int2
comparison too.
Probably we need to cache the comparison function's Oid in with the
Expr in the step and use the correct one each time. I'm unsure of how
the fmgr info should be cached, but looks like it certainly cannot be
cached in the context in an array per partition key. I've so far only
thought some sort of hash table, but I'm sure there must be a much
better way to do this.
I started hacking it partition.c and ended up changing quite a few
things. I changed get_partitions_for_keys into 3 separate functions,
one for hash, list and range and tidied a few things up in that area.
There were a few bugs, for example passing the wrong value for the
size of the array into get_partitions_excluded_by_ne_datums.
I also changed how the Bitmapsets are handled in the step functions
and got rid of the Noop step type completely. I also got rid of the
passing of the srcparts into these functions. I think Roberts idea is
to process the steps in isolation and just combine the partitions
matching each step.
It would be great if we could coordinate our efforts here. I'm posting
this patch now just in case you're working or about to work on this.
In the meantime, I'll continue to drip feed cleanup patches. I'll try
to start writing some comments too, once I figure a few things out...
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v38_drowley_delta1.patchapplication/octet-stream; name=v38_drowley_delta1.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d42f0b748ab..48fa9dbf939 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -194,18 +194,22 @@ static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
static Bitmapset *perform_pruning_step(PartitionPruneContext *context,
- PartitionPruneStep *step,
- Bitmapset *srcparts);
+ PartitionPruneStep *step);
static Bitmapset *perform_pruning_base_step(PartitionPruneContext *context,
PartitionPruneStepOp *opstep);
static Bitmapset *perform_pruning_combine_step(PartitionPruneContext *context,
- PartitionPruneStepCombine *cstep,
- Bitmapset *srcparts);
+ PartitionPruneStepCombine *cstep);
static bool partkey_datum_from_expr(PartitionPruneContext *context,
int partkeyidx, Expr *expr, Datum *value);
-static Bitmapset *get_partitions_for_keys(PartitionPruneContext *context,
- int opstrategy, Datum *values, int nvalues,
- Bitmapset *nullkeys);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ Bitmapset *nullkeys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ Bitmapset *nullkeys);
+static Bitmapset *get_partitions_for_keys_range(
+ PartitionPruneContext *context, int opstrategy,
+ Datum *values, int nvalues, Bitmapset *nullkeys);
static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
Datum *ne_datums, int n_ne_datums);
@@ -1573,32 +1577,46 @@ get_partition_qual_relid(Oid relid)
* get_matching_partitions
* Determine partitions that survive partition pruning steps
*
- * Returns a Bitmapset of indexes of surviving partitions, or NULL if none
- * survive.
+ * Returns a Bitmapset of indexes of surviving partitions.
*/
Bitmapset *
get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps)
{
- /* First there are no unpruned partitions. */
- Bitmapset *result = bms_add_range(NULL, 0, context->nparts - 1);
- ListCell *lc;
-
- /*
- * If there are multiple pruning steps, we perform them one after another,
- * passing the result of one step as input to another. Based on the type
- * of pruning step, perform_pruning_step may add or remove partitions from
- * the set of partitions it receives as the input.
- */
- foreach(lc, pruning_steps)
+ /* If there are no pruning steps then all partitions match. */
+ if (pruning_steps == NIL)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ else
{
- PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *result = NULL;
+ ListCell *lc;
+ bool firststep = true;
- result = bms_int_members(result,
- perform_pruning_step(context, step, result));
- }
+ /*
+ * Below we process each partition pruning step one by one. With each
+ * step we the intersect the result with the previously taken steps so
+ * that we end up with a minimal set of matching partition indexes. When
+ * performing the first step, we take the entire result, so we've
+ * something to intersect on subsequent steps.
+ */
- return result;
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *stepresult;
+
+ stepresult = perform_pruning_step(context, step);
+
+ if (firststep)
+ {
+ result = stepresult;
+ firststep = false;
+ }
+ else
+ result = bms_int_members(result, stepresult);
+ }
+ return result;
+ }
}
/* Module-local functions */
@@ -1609,36 +1627,27 @@ get_matching_partitions(PartitionPruneContext *context,
*/
static Bitmapset *
perform_pruning_step(PartitionPruneContext *context,
- PartitionPruneStep *step,
- Bitmapset *srcparts)
+ PartitionPruneStep *step)
{
switch (nodeTag(step))
{
- case T_PartitionPruneStepNoop:
- /* no-op */
- break;
-
case T_PartitionPruneStepOp:
return perform_pruning_base_step(context,
(PartitionPruneStepOp *) step);
case T_PartitionPruneStepCombine:
return perform_pruning_combine_step(context,
- (PartitionPruneStepCombine *) step,
- srcparts);
+ (PartitionPruneStepCombine *) step);
default:
elog(ERROR, "invalid partition pruning step: %d", nodeTag(step));
- break;
+ return NULL; /* keep compiler quiet */
}
-
- return srcparts;
}
/*
* perform_pruning_base_step
- * Returns indexes of partitions as given by get_partitions_for_keys
- * for information contained in a given PartitionPruneStepOp
+ * Returns indexes of partitions which match 'opstep'.
*/
static Bitmapset *
perform_pruning_base_step(PartitionPruneContext *context,
@@ -1652,9 +1661,7 @@ perform_pruning_base_step(PartitionPruneContext *context,
nvalues = 0;
lc = list_head(opstep->values);
- /*
- * Generate the partition look-up key.
- */
+ /* Generate the partition look-up key. */
for (keyno = 0; keyno < context->partnatts; keyno++)
{
if (bms_is_member(keyno, opstep->nullkeys))
@@ -1679,73 +1686,104 @@ perform_pruning_base_step(PartitionPruneContext *context,
}
}
- return get_partitions_for_keys(context,
- opstep->opstrategy,
- values, nvalues,
- opstep->nullkeys);
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(context,
+ opstep->opstrategy,
+ values, nvalues,
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(context,
+ opstep->opstrategy,
+ values, nvalues,
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(context,
+ opstep->opstrategy,
+ values, nvalues,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ return NULL;
+ }
}
/*
* perform_pruning_combine_step
- * Returns the set of partitions in 'srcparts' that remain after
- * performing the pruning "combine" step specified in 'cstep'
+ * Returns the set of partitions which match this single combine step.
*/
static Bitmapset *
perform_pruning_combine_step(PartitionPruneContext *context,
- PartitionPruneStepCombine *cstep,
- Bitmapset *srcparts)
+ PartitionPruneStepCombine *cstep)
{
+ Bitmapset *result;
ListCell *lc;
- srcparts = bms_copy(srcparts);
switch (cstep->combineOp)
{
case COMBINE_OR:
- {
- Bitmapset *step_parts = NULL;
-
- foreach(lc, cstep->argsteps)
{
- PartitionPruneStep *step = lfirst(lc);
- Bitmapset *argparts;
+ if (cstep->argsteps == NIL)
+ return bms_add_range(NULL, 0, context->nparts - 1);
- /* Recursively get partitions by performing this step. */
- argparts = perform_pruning_step(context, step, srcparts);
- step_parts = bms_add_members(step_parts, argparts);
- }
+ result = NULL;
- return step_parts;
- }
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argresult;
- case COMBINE_AND:
- {
- /* First there are no unpruned partitions. */
- Bitmapset *step_parts = bms_add_range(NULL, 0,
- context->nparts - 1);
+ /* Recursively get partitions by performing this step. */
+ argresult = perform_pruning_step(context, step);
+ result = bms_add_members(result, argresult);
+ }
+
+ return result;
+ }
- foreach(lc, cstep->argsteps)
+ case COMBINE_AND:
{
- PartitionPruneStep *step = lfirst(lc);
- Bitmapset *argparts;
+ bool firststep;
- argparts = perform_pruning_step(context, step, srcparts);
- step_parts = bms_int_members(step_parts, argparts);
- }
+ if (cstep->argsteps == NIL)
+ return bms_add_range(NULL, 0, context->nparts - 1);
- return step_parts;
- }
+ firststep = true;
+ result = NULL;
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argresult;
+
+ argresult = perform_pruning_step(context, step);
+
+ if (firststep)
+ {
+ result = argresult;
+ firststep = false;
+ }
+ else
+ result = bms_int_members(result, argresult);
+ }
+
+ return result;
+ }
case COMBINE_NOT:
{
- Bitmapset *step_parts = NULL;
- Datum *ne_datums;
- int n_ne_datums = list_length(cstep->argvalues),
- i;
+ Bitmapset *stepresult;
+ Datum *ne_datums;
+ int n_ne_datums = list_length(cstep->argvalues),
+ i;
/*
- * XXX- The following ad-hoc method of pruning only works for list
- * partitioning. It checks for each partition if all of its
- * accepted values appear in ne_datums[].
+ * Apply not-equal clauses. This only applies in the list
+ * partitioning case as this is the only partition type where we
+ * have knowledge of the entire set of values that can be stored
+ * in a given partition.
*/
ne_datums = (Datum *) palloc0(n_ne_datums * sizeof(Datum));
i = 0;
@@ -1756,24 +1794,26 @@ perform_pruning_combine_step(PartitionPruneContext *context,
/*
* Note that we're passing 0 for partkeyidx, because there can
- * be only one partition key column for list partitioning.
+ * be only one partition key with list partitioning.
*/
if (partkey_datum_from_expr(context, 0, expr, &datum))
ne_datums[i++] = datum;
}
- step_parts = get_partitions_excluded_by_ne_datums(context,
+ stepresult = get_partitions_excluded_by_ne_datums(context,
ne_datums,
- n_ne_datums);
- return bms_del_members(srcparts, step_parts);
+ i);
+
+ /* All partitions apart from the stepresult partitions match */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ return bms_del_members(result, stepresult);
}
default:
- /* Return the source partitions as is; should never happen. */
- break;
+ elog(ERROR, "Invalid PartitionPruneCombineOp: %d", (int)
+ cstep->combineOp);
+ return NULL; /* keep compiler quiet */
}
-
- return srcparts;
}
/*
@@ -1819,549 +1859,527 @@ partkey_datum_from_expr(PartitionPruneContext *context, int partkeyidx,
}
/*
- * get_partitions_for_keys
- * Returns the index of partitions that
- * given look up keys
- *
- * Input:
- * See the comments above the definition of PartScanKeyInfo to see what
- * kind of information is contained in 'keys'.
- *
- * Outputs:
- * Bitmapset containing indexes of the selected partitions
+ * get_partitions_for_keys_hash
+ * Determine the minimum set of partitions matching the specified values
+ * using hash partitioning.
*/
static Bitmapset *
-get_partitions_for_keys(PartitionPruneContext *context,
- int opstrategy, Datum *values, int nvalues,
- Bitmapset *nullkeys)
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ Bitmapset *nullkeys)
{
FmgrInfo *partsupfunc = context->partsupfunc;
PartitionBoundInfo boundinfo = context->boundinfo;
- int *partindices = boundinfo->indexes,
- default_index = boundinfo->default_index;
- Oid *partcollation = context->partcollation;
+ int *partindices = boundinfo->indexes;
int partnatts = context->partnatts;
- Bitmapset *result = NULL;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
- switch (context->strategy)
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we have such clauses for all keys.
+ */
+ if (nvalues == partnatts)
{
- case PARTITION_STRATEGY_HASH:
- {
- uint64 rowHash;
- bool isnull[PARTITION_MAX_KEYS];
- int i,
- greatest_modulus,
- result_index;
-
- memset(isnull, false, partnatts * sizeof(bool));
- if (!bms_is_empty(nullkeys))
- {
- i = -1;
- while ((i = bms_next_member(nullkeys, i)) >= 0)
- {
- Assert(i < partnatts);
- isnull[i] = true;
- }
- }
+ uint64 rowHash;
+ int greatest_modulus,
+ result_index;
- /*
- * In this case, can only do pruning if we know values for all
- * the keys and they're all non-null.
- */
- if (nvalues == context->partnatts)
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+ result_index = partindices[rowHash % greatest_modulus];
+
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+ }
+ else
+ {
+ /* clauses missing for some keys, return all partitions. */
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ }
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Determine the minimum set of partitions matching the specified values
+ * using list partitioning.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ Bitmapset *nullkeys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+ Bitmapset *result;
+ int partnatts = context->partnatts;
+ int default_index = boundinfo->default_index;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partnatts == 1);
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(default_index);
+ else
+ result = NULL;
+
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null values and
+ * return.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0 && is_equal)
{
- greatest_modulus = get_greatest_modulus(boundinfo);
- rowHash = compute_hash_value(partnatts, partsupfunc, values,
- isnull);
- result_index = partindices[rowHash % greatest_modulus];
- if (result_index >= 0)
- return bms_make_singleton(result_index);
+ /* An exact matching datum exists. */
+ Assert(partindices[off] >= 0);
+ return bms_make_singleton(partindices[off]);
}
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
else
- /* Can't do pruning otherwise, so return all partitions. */
- return bms_add_range(NULL, 0, context->nparts - 1);
+ return NULL;
break;
- }
- case PARTITION_STRATEGY_LIST:
- {
- int off,
- minoff,
- maxoff,
- i;
- bool is_equal;
- bool inclusive = false;
-
- Assert(partnatts == 1);
-
- if (!bms_is_empty(nullkeys))
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
{
/*
- * Nulls may exist in only one partition - the partition whose
- * accepted set of values includes null or the default
- * partition if the former doesn't exist.
+ * This case means all partition bounds are greater, which in
+ * turn means that all partition satisfy this key.
*/
- if (partition_bound_accepts_nulls(boundinfo))
- return bms_make_singleton(boundinfo->null_index);
- else if (partition_bound_has_default(boundinfo))
- return bms_make_singleton(boundinfo->default_index);
- else
- return NULL;
+ off = 0;
}
/*
- * If there are no datums to compare keys with, but there are
- * partitions, just return the default partition if one exists.
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, which is already set in 'result' if one
+ * exists.
*/
- if (boundinfo->ndatums == 0)
- {
- if (partition_bound_has_default(boundinfo))
- return bms_make_singleton(default_index);
- else
- return NULL; /* shouldn't happen */
- }
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
- minoff = 0;
- maxoff = boundinfo->ndatums - 1;
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, values[0],
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
/*
- * For range queries, always include the default list partition,
- * because list partitions divide the key space in a discontinuous
- * manner, not all values in the given range will have a partition
- * assigned. This may not technically be true for some data types
- * (e.g. integer types), however, we currently lack any sort of
- * infrastructure to provide us with proofs that would allow us to
- * do anything smarter here.
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, which is already set in 'result' if one
+ * exists.
*/
- if (opstrategy != BTEqualStrategyNumber &&
- partition_bound_has_default(boundinfo))
- result = bms_add_member(result, default_index);
+ if (off < 0)
+ return result;
- if (nvalues == 0)
- {
- /*
- * Add indexes of *all* partitions containing non-null
- * values and return.
- */
- for (i = minoff; i <= maxoff; i++)
- result = bms_add_member(result, partindices[i]);
+ maxoff = off;
+ break;
- return result;
- }
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /* Finally add the partition indexes. */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Determine the minimum set of partitions matching the specified values
+ * using range partitioning.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ Bitmapset *nullkeys)
+{
+ FmgrInfo *partsupfunc = context->partsupfunc;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int default_index = boundinfo->default_index;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+ Bitmapset *result = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+
+ /*
+ * If there are no datums to compare keys with, or if we got a IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
- switch (opstrategy)
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null values and
+ * return.
+ */
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ result = bms_add_range(result, partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
{
- case BTEqualStrategyNumber:
+ if (nvalues == partnatts)
{
- off = partition_list_bsearch(partsupfunc,
- partcollation,
- boundinfo, values[0],
- &is_equal);
- if (off >= 0 && is_equal)
- {
- /* An exact matching datum exists. */
- Assert(partindices[off] >= 0);
- return bms_make_singleton(partindices[off]);
- }
+ /* There can only be one partition. */
+ if (partindices[off + 1] >= 0)
+ return bms_make_singleton(partindices[off + 1]);
else if (partition_bound_has_default(boundinfo))
return bms_make_singleton(default_index);
else
return NULL;
- break;
}
-
- case BTGreaterEqualStrategyNumber:
- inclusive = true;
- /* fall through */
- case BTGreaterStrategyNumber:
+ else
{
- off = partition_list_bsearch(partsupfunc,
- partcollation,
- boundinfo, values[0],
- &is_equal);
- if (off >= 0)
+ int saved_off = off;
+
+ /* Matched a prefix of the partition bound at off. */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
{
- /*
- * We don't want the matched datum to be in the
- * result.
- */
- if (!is_equal || !inclusive)
- off++;
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off-1],
+ boundinfo->kind[off-1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
}
- else
+ minoff = off;
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
{
- /*
- * This case means all partition bounds are
- * greater, which in turn means that all
- * partition satisfy this key.
- */
- off = 0;
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
}
-
- /*
- * off is greater than the numbers of datums we have
- * partitions for. The only possible partition that
- * could contain a match is the default partition.
- * Return that, if it exists.
- */
- if (off > boundinfo->ndatums - 1)
- return partition_bound_has_default(boundinfo)
- ? bms_make_singleton(default_index)
- : NULL;
-
- minoff = off;
- break;
- }
-
- case BTLessEqualStrategyNumber:
- inclusive = true;
- /* fall through */
- case BTLessStrategyNumber:
- {
- off = partition_list_bsearch(partsupfunc,
- partcollation,
- boundinfo, values[0],
- &is_equal);
- if (off >= 0 && is_equal && !inclusive)
- off--;
-
- /*
- * off is smaller than the datums of all non-default
- * partitions, meaning there isn't one to return.
- * Return the default partition if one exists.
- */
- if (off < 0)
- return partition_bound_has_default(boundinfo)
- ? bms_make_singleton(default_index)
- : NULL;
-
- maxoff = off;
- break;
+ maxoff = off+1;
}
-
- default:
- elog(ERROR, "invalid btree operator strategy: %d",
- opstrategy);
- break;
}
-
- /* Finally add the partition indexes. */
- for (i = minoff; i <= maxoff; i++)
- result = bms_add_member(result, partindices[i]);
- }
- break;
-
- case PARTITION_STRATEGY_RANGE:
- {
- int off,
- minoff,
- maxoff,
- i;
- bool is_equal;
- bool inclusive = false;
-
- /* Nulls may exist in only the default range partition */
- if (!bms_is_empty(nullkeys))
- return partition_bound_has_default(boundinfo)
- ? bms_make_singleton(boundinfo->default_index)
- : NULL;
-
- /*
- * If there are no datums to compare keys with, but there are
- * partitions, just return the default partition if one exists.
- */
- if (boundinfo->ndatums == 0)
+ else if (off >= 0)
{
- if (partition_bound_has_default(boundinfo))
+ if (partindices[off+1] >= 0)
+ minoff = maxoff = off + 1;
+ else if (partition_bound_has_default(boundinfo))
return bms_make_singleton(default_index);
else
- return NULL; /* shouldn't happen */
+ return NULL;
}
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+
+ if (partindices[minoff] < 0 &&
+ minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
- minoff = 0;
- maxoff = boundinfo->ndatums;
- if (nvalues == 0)
+ if (off < 0)
{
/*
- * Add indexes of *all* partitions containing non-null
- * values and return.
+ * All partition bounds are greater than the key, so include
+ * all partitions in the result.
*/
- if (partindices[minoff] < 0)
- minoff++;
- if (partindices[maxoff] < 0)
- maxoff--;
- result = bms_add_range(result,
- partindices[minoff],
- partindices[maxoff]);
- if (partition_bound_has_default(boundinfo))
- result = bms_add_member(result, default_index);
- return result;
+ off = 0;
}
-
- switch (opstrategy)
+ else
{
- case BTEqualStrategyNumber:
+ if (is_equal && nvalues < partnatts)
{
- off = partition_range_datum_bsearch(partsupfunc,
- partcollation,
- boundinfo,
- nvalues, values,
- &is_equal);
-
- if (off >= 0 && is_equal)
+ /* Matched a prefix of the partition bound at off. */
+ while (off < boundinfo->ndatums - 1)
{
- if (nvalues == partnatts)
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
{
- /* There can only be one partition. */
- if (partindices[off+1] >= 0)
- return bms_make_singleton(partindices[off+1]);
- else if (partition_bound_has_default(boundinfo))
- return bms_make_singleton(default_index);
- else
- return NULL;
- }
- else
- {
- int saved_off = off;
-
- /*
- * Matched a prefix of the partition bound at off.
- */
- while (off >= 1 && off < boundinfo->ndatums - 1)
- {
- int32 cmpval;
-
- cmpval =
- partition_rbound_datum_cmp(partsupfunc,
- partcollation,
- boundinfo->datums[off-1],
- boundinfo->kind[off-1],
- values, nvalues);
- if (cmpval != 0)
- break;
- off--;
- }
- minoff = off;
- off = saved_off;
- while (off < boundinfo->ndatums - 1)
- {
- int32 cmpval;
-
- cmpval =
- partition_rbound_datum_cmp(partsupfunc,
- partcollation,
- boundinfo->datums[off+1],
- boundinfo->kind[off+1],
- values, nvalues);
- if (cmpval != 0)
- break;
+ if (!inclusive)
off++;
- }
- maxoff = off+1;
+ break;
}
+ off = nextoff;
}
- else if (off >= 0)
- {
- if (partindices[off+1] >= 0)
- minoff = maxoff = off + 1;
- else if (partition_bound_has_default(boundinfo))
- return bms_make_singleton(default_index);
- else
- return NULL;
- }
- else if (partition_bound_has_default(boundinfo))
- return bms_make_singleton(default_index);
- else
- return NULL;
-
- if (partindices[minoff] < 0 &&
- minoff < boundinfo->ndatums)
- minoff++;
- if (partindices[maxoff] < 0 && maxoff >= 1)
- maxoff--;
- break;
}
+ else
+ off++;
+ }
- case BTGreaterEqualStrategyNumber:
- inclusive = true;
- /* fall through */
- case BTGreaterStrategyNumber:
- {
- off = partition_range_datum_bsearch(partsupfunc,
- partcollation,
- boundinfo,
- nvalues, values,
- &is_equal);
-
- if (off < 0)
- {
- /*
- * All partition bounds are greater than the key, so
- * include all partitions in the result.
- */
- off = 0;
- }
- else
- {
- if (is_equal && nvalues < partnatts)
- {
- /*
- * Matched a prefix of the partition bound at off.
- */
- while (off < boundinfo->ndatums - 1)
- {
- int32 cmpval;
- int nextoff;
+ minoff = off;
+ break;
- nextoff = inclusive ? off - 1 : off + 1;
- cmpval =
- partition_rbound_datum_cmp(partsupfunc,
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
partcollation,
- boundinfo->datums[nextoff],
- boundinfo->kind[nextoff],
- values, nvalues);
- if (cmpval != 0)
- {
- if (!inclusive)
- off++;
- break;
- }
- off = nextoff;
- }
- }
- else
- off++;
- }
-
- minoff = off;
- break;
- }
+ boundinfo,
+ nvalues, values,
+ &is_equal);
- case BTLessEqualStrategyNumber:
- inclusive = true;
- /* fall through */
- case BTLessStrategyNumber:
+ if (off >= 0)
+ {
+ /* Matched prefix of the partition bound at off. */
+ if (is_equal && nvalues < partnatts)
{
- off = partition_range_datum_bsearch(partsupfunc,
- partcollation,
- boundinfo,
- nvalues, values,
- &is_equal);
-
- if (off >= 0)
+ while (off < boundinfo->ndatums - 1)
{
- /*
- * Matched prefix of the partition bound at off.
- */
- if (is_equal && nvalues < partnatts)
- {
- while (off < boundinfo->ndatums - 1)
- {
- int32 cmpval;
- int nextoff;
+ int32 cmpval;
+ int nextoff;
- nextoff = inclusive ? off + 1 : off - 1;
- cmpval =
- partition_rbound_datum_cmp(partsupfunc,
- partcollation,
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
boundinfo->datums[nextoff],
- boundinfo->kind[nextoff],
- values, nvalues);
- if (cmpval != 0)
- {
- if (!inclusive)
- off--;
- break;
- }
- off = nextoff;
- }
-
- off++;
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off--;
+ break;
}
- else if (!is_equal || inclusive)
- off++;
- }
- else
- {
- /*
- * All partition bounds are greater than the key, so
- * select none of the partitions, except the default.
- */
- if (partition_bound_has_default(boundinfo))
- return bms_make_singleton(default_index);
- return NULL;
+ off = nextoff;
}
- maxoff = off;
- break;
+ off++;
}
-
- default:
- elog(ERROR, "invalid btree operator strategy: %d",
- opstrategy);
- break;
+ else if (!is_equal || inclusive)
+ off++;
}
-
- Assert (minoff >= 0 && maxoff >= 0);
- if (partindices[minoff] < 0)
+ else
{
- int lastkey = nvalues - 1;
+ /*
+ * All partition bounds are greater than the key, so select
+ * none of the partitions, except the default.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ return NULL;
+ }
- if (minoff >=0 && minoff < boundinfo->ndatums &&
- boundinfo->kind[minoff][lastkey] ==
- PARTITION_RANGE_DATUM_VALUE &&
- partition_bound_has_default(boundinfo))
- result = bms_add_member(result, default_index);
+ maxoff = off;
+ break;
- minoff++;
- }
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
- if (maxoff >= 1 && partindices[maxoff] < 0)
- {
- int lastkey = nvalues - 1;
+ Assert (minoff >= 0 && maxoff >= 0);
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
- if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
- boundinfo->kind[maxoff - 1][lastkey] ==
- PARTITION_RANGE_DATUM_VALUE &&
- partition_bound_has_default(boundinfo))
- result = bms_add_member(result, default_index);
+ if (minoff >=0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
- maxoff--;
- }
+ minoff++;
+ }
- if (minoff <= maxoff)
- result = bms_add_range(result,
- partindices[minoff],
- partindices[maxoff]);
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
- if (partition_bound_has_default(boundinfo))
- {
- /*
- * Since partition keys with nulls are mapped to the default
- * range partition, we must include the default partition if
- * some keys could be null.
- */
- if (nvalues < partnatts)
- result = bms_add_member(result, default_index);
+ if (maxoff >=0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
- /*
- * There may exist a range of values unassigned to any
- * non-default partition between the datums at minoff and
- * maxoff. Add the default partition in that case.
- */
- for (i = minoff; i <= maxoff; i++)
- {
- if (partindices[i] < 0)
- return bms_add_member(result, default_index);
- }
- }
- }
- break;
+ maxoff--;
+ }
- default:
- elog(ERROR, "unexpected partition strategy: %d",
- context->strategy);
- break;
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys could
+ * be null.
+ */
+ if (nvalues < partnatts)
+ result = bms_add_member(result, default_index);
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ return bms_add_member(result, default_index);
+ }
}
return result;
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2f1d5bd2a1e..ea890385a33 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2132,19 +2132,6 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
-/*
- * _copyPartitionPruneStepNoop
- */
-static PartitionPruneStepNoop *
-_copyPartitionPruneStepNoop(const PartitionPruneStepNoop *from)
-{
- PartitionPruneStepNoop *newnode = makeNode(PartitionPruneStepNoop);
-
- /* Nothing to copy. */
-
- return newnode;
-}
-
/*
* _copyPartitionPruneStepOp
*/
@@ -5051,9 +5038,6 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
- case T_PartitionPruneStepNoop:
- retval = _copyPartitionPruneStepNoop(from);
- break;
case T_PartitionPruneStepOp:
retval = _copyPartitionPruneStepOp(from);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 9f50552e4e9..0189a931e97 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,9 +2146,6 @@ expression_tree_walker(Node *node,
return true;
}
break;
- case T_PartitionPruneStepNoop:
- /* No sub-structure. */
- return true;
case T_PartitionPruneStepOp:
{
PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
@@ -2953,16 +2950,6 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
- case T_PartitionPruneStepNoop:
- {
- PartitionPruneStepNoop *noopstep = (PartitionPruneStepNoop *) node;
- PartitionPruneStepNoop *newnode;
-
- FLATCOPY(newnode, noopstep, PartitionPruneStepNoop);
-
- return (Node *) newnode;
- }
- break;
case T_PartitionPruneStepOp:
{
PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index aa18cc2ae44..95adf3e7abf 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -284,17 +284,16 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
{
/*
* No steps means the arg wasn't a clause matching
- * this partition key. We cannot prune using such
- * an arg. To indicate that to the pruning code,
- * we must construct a PartitionPruneStepNoop and
- * append it as an argument of the OR pruning combine
- * step. However, if we can prove using constraint
- * exclusion that the clause refutes the table's
- * partition constraint (if it's sub-partitioned),
- * we need not bother with that.
+ * this partition key. We cannot prune using such an
+ * arg. To indicate that to the pruning code, we must
+ * construct a PartitionPruneStepCombine and set the
+ * argsteps to an empty List. However, if we can
+ * prove using constraint exclusion that the clause
+ * refutes the table's partition constraint (if it's
+ * sub-partitioned), we need not bother with that.
*/
List *partconstr = rel->partition_qual;
- PartitionPruneStepNoop *noop;
+ PartitionPruneStepCombine *orstep;
if (partconstr)
{
@@ -309,8 +308,10 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
continue;
}
- noop = makeNode(PartitionPruneStepNoop);
- all_arg_steps = lappend(all_arg_steps, noop);
+ orstep = makeNode(PartitionPruneStepCombine);
+ orstep->combineOp = COMBINE_OR;
+ orstep->argsteps = NIL;
+ all_arg_steps = lappend(all_arg_steps, orstep);
}
}
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 959fed848b2..7a14bbb10b1 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -192,7 +192,6 @@ typedef enum NodeTag
T_OnConflictExpr,
T_IntoClause,
T_PartitionPruneStep,
- T_PartitionPruneStepNoop,
T_PartitionPruneStepOp,
T_PartitionPruneStepCombine,
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 47ac3da77a1..2af9512dd92 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1516,12 +1516,6 @@ typedef struct PartitionPruneStep
NodeTag type;
} PartitionPruneStep;
-/* a no-op step that doesn't prune any of the partitions. */
-typedef struct PartitionPruneStepNoop
-{
- PartitionPruneStep step;
-} PartitionPruneStepNoop;
-
typedef struct PartitionPruneStepOp
{
PartitionPruneStep step;
Hi David.
On 2018/03/23 16:38, David Rowley wrote:
On 21 March 2018 at 00:07, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached is further revised version.
Hi Amit,
Thanks for sending the v38 patch.
I started reviewing this, but I just ended up starting to hack at the
patch instead. It's still got quite a bit of work to be done as I
think, unfortunately, the cross type stuff is still pretty broken.
There's not really any sort of checking done to make sure you've found
a valid cross type hash or compare function in the code which results
in errors like:create table hashp (a int, b numeric) partition by hash(a,b);
create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
create table hashp3 partition of hashp for values with (modulus 4, remainder 2);
create table hashp4 partition of hashp for values with (modulus 4, remainder 3);
explain select * from hashp where a = 3::smallint and b = 1.0;
ERROR: cache lookup failed for function 0
Hmm yes. I had realized that while addressing Robert's related comment.
I'm not really sure if this should be a matter of doing an if
(!OidIsValid(new_supfuncid)) return false; I think the
context->partsupfunc must be pretty broken in cases like:create table listp (a bigint) partition by list(a);
create table listp1 partition of listp for values in(1);
select * from listp where a <> 1::smallint and a <> 1::bigint;The current patch simply just remembers the last comparison function
for comparing int8 to int4 and uses that one for the int8 to int2
comparison too.Probably we need to cache the comparison function's Oid in with the
Expr in the step and use the correct one each time. I'm unsure of how
the fmgr info should be cached, but looks like it certainly cannot be
cached in the context in an array per partition key. I've so far only
thought some sort of hash table, but I'm sure there must be a much
better way to do this.
Yeah, I realized that simply replacing the context->partsupfunc member is
not a solution.
In the updated patch (that is, after incorporating your changes), I have
moved this partsupfunc switching to the caller of partkey_datum_from_expr
instead of doing it there. New patch also checks that returned function
OID is valid, which if not we don't use the expression's value for pruning.
So now. we statically allocate a partsupfunc array on every invocation of
perform_pruning_base_step() or of get_partitions_excluded_by_ne_datums().
Considering run-time pruning, we may have to find some other place to
cache that.
I started hacking it partition.c and ended up changing quite a few
things. I changed get_partitions_for_keys into 3 separate functions,
one for hash, list and range and tidied a few things up in that area.
There were a few bugs, for example passing the wrong value for the
size of the array into get_partitions_excluded_by_ne_datums.I also changed how the Bitmapsets are handled in the step functions
and got rid of the Noop step type completely. I also got rid of the
passing of the srcparts into these functions. I think Roberts idea is
to process the steps in isolation and just combine the partitions
matching each step.>
It would be great if we could coordinate our efforts here. I'm posting
this patch now just in case you're working or about to work on this.
Thanks a lot for making all those changes and sharing the patch. I've
incorporated in the attached latest version.
In the meantime, I'll continue to drip feed cleanup patches. I'll try
to start writing some comments too, once I figure a few things out...
Here is the updated version.
I'm still thinking about what to do about avoiding recursion when
performing combine steps [1] as Robert mentioned in his email.
Thanks,
Amit
Attachments:
v39-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v39-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From dca4e1c3a9dc55c38123e999631578e8303f4ffe Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v39 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index bd3a0c4a0a..093ca5208e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..f151646271 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v39-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v39-0002-Add-more-tests-for-partition-pruning.patchDownload
From dd0eb6ec8d059b60a8ea9889be96f0eb5e954316 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v39 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 255 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 86 ++++++++-
2 files changed, 339 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..e2b90f3263 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,257 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..38b5f68658 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,88 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v39-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v39-0003-Faster-partition-pruning.patchDownload
From 41a50af6bd3a8ad51c8776cbe0b7ee1ad11db672 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v39 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 999 +++++++++++++++++++
src/backend/nodes/copyfuncs.c | 36 +
src/backend/nodes/nodeFuncs.c | 41 +
src/backend/optimizer/path/allpaths.c | 21 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1282 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 3 +
src/include/nodes/primnodes.h | 35 +
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 318 ++++--
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 6 +
18 files changed, 2807 insertions(+), 86 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 53855f5088..2df2530d5e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,28 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *perform_pruning_step(PartitionPruneContext *context,
+ PartitionPruneStep *step);
+static Bitmapset *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static Bitmapset *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_partitions_for_keys_range(
+ PartitionPruneContext *context, int opstrategy,
+ Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums,
+ FmgrInfo **partsupfunc);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1553,9 +1575,986 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ /* If there are no pruning steps then all partitions match. */
+ if (pruning_steps == NIL)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ else
+ {
+ Bitmapset *result = NULL;
+ ListCell *lc;
+ bool firststep = true;
+
+ /*
+ * Below we process each partition pruning step one by one. With each
+ * step we the intersect the result with the previously taken steps so
+ * that we end up with a minimal set of matching partition indexes.
+ * When performing the first step, we take the entire result, so we've
+ * something to intersect on subsequent steps.
+ */
+
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *stepresult;
+
+ stepresult = perform_pruning_step(context, step);
+
+ if (firststep)
+ {
+ result = stepresult;
+ firststep = false;
+ }
+ else
+ result = bms_int_members(result, stepresult);
+ }
+ return result;
+ }
+}
+
/* Module-local functions */
/*
+ * perform_pruning_step
+ * Performs one PartitionPruneStep
+ */
+static Bitmapset *
+perform_pruning_step(PartitionPruneContext *context,
+ PartitionPruneStep *step)
+{
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ return perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+
+ case T_PartitionPruneStepCombine:
+ return perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step);
+
+ default:
+ elog(ERROR, "invalid partition pruning step: %d", nodeTag(step));
+ return NULL; /* keep compiler quiet */
+ }
+}
+
+/*
+ * perform_pruning_base_step
+ * Returns indexes of partitions which match 'opstep'.
+ */
+static Bitmapset *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ nvalues = 0;
+ lc = list_head(opstep->values);
+
+ /* Generate the partition look-up key. */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid exprTyp = exprType((Node *) expr);
+
+ /* Check if we need to use a different comparison function. */
+ if (context->partopcintype[keyno] != exprTyp)
+ {
+ Oid cmpfn;
+ int16 procnum;
+
+ procnum = (context->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+ cmpfn = get_opfamily_proc(context->partopfamily[keyno],
+ context->partopcintype[keyno],
+ exprTyp, procnum);
+ if (OidIsValid(cmpfn))
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else /* Can't really use datum for pruning. */
+ continue;
+ }
+ else
+ partsupfunc[keyno] = context->partsupfunc[keyno];
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc = lnext(lc);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ return NULL;
+ }
+}
+
+/*
+ * perform_pruning_combine_step
+ * Returns the set of partitions which match this single combine step.
+ */
+static Bitmapset *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep)
+{
+ Bitmapset *result;
+ ListCell *lc;
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_OR:
+ {
+ if (cstep->argsteps == NIL)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ result = NULL;
+
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argresult;
+
+ /* Recursively get partitions by performing this step. */
+ argresult = perform_pruning_step(context, step);
+ result = bms_add_members(result, argresult);
+ }
+
+ return result;
+ }
+
+ case COMBINE_AND:
+ {
+ bool firststep;
+
+ if (cstep->argsteps == NIL)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ firststep = true;
+ result = NULL;
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argresult;
+
+ argresult = perform_pruning_step(context, step);
+
+ if (firststep)
+ {
+ result = argresult;
+ firststep = false;
+ }
+ else
+ result = bms_int_members(result, argresult);
+ }
+
+ return result;
+ }
+
+ case COMBINE_NOT:
+ {
+ Bitmapset *stepresult;
+ Datum *ne_datums;
+ int n_ne_datums = list_length(cstep->argvalues),
+ i;
+ FmgrInfo **partsupfunc;
+
+ /*
+ * Apply not-equal clauses. This only applies in the list
+ * partitioning case as this is the only partition type where
+ * we have knowledge of the entire set of values that can be
+ * stored in a given partition.
+ */
+ ne_datums = (Datum *) palloc0(n_ne_datums * sizeof(Datum));
+
+ /*
+ * Some datums may require different comparison function than
+ * the default partitioning-specific one.
+ */
+ partsupfunc = (FmgrInfo **)
+ palloc0(n_ne_datums * sizeof(FmgrInfo *));
+ i = 0;
+ foreach(lc, cstep->argvalues)
+ {
+ Expr *expr = lfirst(lc);
+ Datum datum;
+
+ /*
+ * Note that we're passing 0 for partkeyidx, because there
+ * can be only one partition key with list partitioning.
+ */
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid exprTyp = exprType((Node *) expr);
+
+ /*
+ * Check if we need to use a different comparison
+ * function for this value.
+ */
+ if (context->partopcintype[0] != exprTyp)
+ {
+ Oid cmpfn;
+
+ cmpfn = get_opfamily_proc(context->partopfamily[0],
+ context->partopcintype[0],
+ exprTyp, BTORDER_PROC);
+ if (OidIsValid(cmpfn))
+ {
+ partsupfunc[i] = palloc0(sizeof(FmgrInfo));
+ fmgr_info(cmpfn, partsupfunc[i]);
+ }
+ else /* Can't really use datum for pruning. */
+ continue;
+ }
+ else
+ partsupfunc[i] = &context->partsupfunc[0];
+
+ ne_datums[i++] = datum;
+ }
+ }
+
+ stepresult = get_partitions_excluded_by_ne_datums(context,
+ ne_datums,
+ i,
+ partsupfunc);
+
+ /* All partitions apart from the stepresult partitions match */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ return bms_del_members(result, stepresult);
+ }
+
+ default:
+ elog(ERROR, "Invalid PartitionPruneCombineOp: %d", (int)
+ cstep->combineOp);
+ return NULL; /* keep compiler quiet */
+ }
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Determine the minimum set of partitions matching the specified values
+ * using hash partitioning.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus,
+ result_index;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we have such clauses for all keys, which the planner must have
+ * found or we wouldn't have gotten here.
+ */
+ Assert(nvalues + bms_num_members(nullkeys) == partnatts);
+
+ /*
+ * If there are any values, they must have come from clauses containing
+ * an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+ result_index = partindices[rowHash % greatest_modulus];
+
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Determine the minimum set of partitions matching the specified values
+ * using list partitioning.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because list partitioning.
+ * 'value' contains the value to use for pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains list partitioning comparison function to be used to
+ * perform partition_list_bsearch
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+ Bitmapset *result;
+ int partnatts = context->partnatts;
+ int default_index = boundinfo->default_index;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partnatts == 1);
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(default_index);
+ else
+ result = NULL;
+
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null values and
+ * return.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(partindices[off] >= 0);
+ return bms_make_singleton(partindices[off]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partition satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, which is already set in 'result' if one
+ * exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, which is already set in 'result' if one
+ * exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /* Finally add the partition indexes. */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Determine the minimum set of partitions matching the specified values
+ * using range partitioning.
+ *
+ * 'nvalues', if non-zero, should be <= context->partntts - 1
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains range partitioning comparison function to be used to
+ * perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int default_index = boundinfo->default_index;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+ Bitmapset *result = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+
+ /*
+ * If there are no datums to compare keys with, or if we got a IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null values and
+ * return.
+ */
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ result = bms_add_range(result, partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be one partition. */
+ if (partindices[off + 1] >= 0)
+ return bms_make_singleton(partindices[off + 1]);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /* Matched a prefix of the partition bound at off. */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+ minoff = off;
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+ maxoff = off + 1;
+ }
+ }
+ else if (off >= 0)
+ {
+ if (partindices[off + 1] >= 0)
+ minoff = maxoff = off + 1;
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+
+ if (partindices[minoff] < 0 &&
+ minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off < 0)
+ {
+ /*
+ * All partition bounds are greater than the key, so include
+ * all partitions in the result.
+ */
+ off = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /* Matched a prefix of the partition bound at off. */
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off++;
+ break;
+ }
+ off = nextoff;
+ }
+ }
+ else
+ off++;
+ }
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0)
+ {
+ /* Matched prefix of the partition bound at off. */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off--;
+ break;
+ }
+ off = nextoff;
+ }
+
+ off++;
+ }
+ else if (!is_equal || inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * All partition bounds are greater than the key, so select
+ * none of the partitions, except the default.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ return NULL;
+ }
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys could
+ * be null.
+ */
+ if (nvalues < partnatts)
+ result = bms_add_member(result, default_index);
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ return bms_add_member(result, default_index);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums,
+ FmgrInfo **partsupfunc)
+{
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ i,
+ *datums_in_part,
+ *datums_found;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Bitmapset *excluded_parts;
+ Bitmapset *foundoffsets = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc[i], partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /*
+ * No partitions can be excluded if none of the partitions accept the
+ * datums in ne_datums[].
+ */
+ if (bms_is_empty(foundoffsets))
+ return NULL;
+
+ /*
+ * Since each list partition can permit multiple values, we must ensure
+ * that we got clauses for all those values before we can eliminate the
+ * the entire partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found matches the number of datums allowed in the partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ datums_found = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_found[boundinfo->indexes[i]]++;
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we found
+ * clauses for all its permitted values. We must be careful here not to
+ * eliminate the default partition. We can recognize that by it having a
+ * zero value in both arrays.
+ */
+ excluded_parts = NULL;
+
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_found[i] >= datums_in_part[i] && datums_found[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+
+ /*
+ * Because the above clauses are strict, we can also exclude the NULL
+ * partition, provided it does not also allow non-NULL values.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ pfree(datums_in_part);
+ pfree(datums_found);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7293a60d7..0629607cf4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2133,6 +2133,36 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(values);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(argsteps);
+ COPY_NODE_FIELD(argvalues);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,6 +5054,12 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..0a3e32ecd1 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,24 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->values, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep = (PartitionPruneStepCombine *) node;
+
+ if (walker((Node *) cstep->argsteps, context))
+ return true;
+ if (walker((Node *) cstep->argvalues, context))
+ return true;
+ }
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2950,29 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->values, opstep->values, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep = (PartitionPruneStepCombine *) node;
+ PartitionPruneStepCombine *newnode;
+
+ FLATCOPY(newnode, cstep, PartitionPruneStepCombine);
+ MUTATE(newnode->argsteps, cstep->argsteps, List *);
+ MUTATE(newnode->argvalues, cstep->argvalues, List *);
+
+ return (Node *) newnode;
+ }
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..f64a2bf090 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -867,12 +868,21 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
Assert(IS_SIMPLE_REL(rel));
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
/*
* Initialize to compute size estimates for whole append relation.
*
@@ -1121,6 +1131,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (IS_PARTITIONED_REL(rel) && did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * This child need not be scanned, so we can omit it from the
+ * appendrel.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..6e7bec429b
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1282 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides functions to prune partitions of a partitioned table by
+ * comparing provided set of clauses with the table's partitions'
+ * boundaries
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'value' */
+ Expr *value; /* The value the partition key is being
+ * compared to */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static List *get_steps_using_prefix(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix,
+ ListCell *start,
+ List *step_values);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of relations belonging to the minimum set of
+ * partitions which must be scanned to satisfy rel's baserestrictinfo
+ * quals or NULL if no partitions exist.
+ *
+ * Only call this if 'rel' corresponds to a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ return generate_partition_pruning_steps_internal(rel, clauses,
+ constfalse);
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * For each operator clause that's matched with a partition key, we generate
+ * a PartitionPruneStepOp containing relevant details of the operator and
+ * the expression whose value to use for comparison against partition bounds.
+ *
+ * If we encounter an OR clause, we generate a PartitionPruneStepCombine whose
+ * arguments are other partition pruning steps, each of which might be a
+ * PartitionPruneStepOp or another PartitionPruneStepCombine.
+ *
+ * If we find a RestrictInfo that's marked as pseudoconstant and contains a
+ * constant false value for clause, we stop generating any further steps and
+ * return NIL (no pruning steps) after setting *constfalse to true.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of important lists before passing them to this
+ * function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber],
+ *ne_clauses = NIL;
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL,
+ *opsteps = NIL;
+ ListCell *lc;
+ int i;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ PartitionPruneStepCombine *combineStep;
+ List *all_arg_steps = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ /*
+ * If arg was a clause matching this partition key, we'd
+ * get back the corresponding pruning step.
+ */
+ if (argsteps != NIL)
+ {
+ Assert(list_length(argsteps) == 1);
+ all_arg_steps = lappend(all_arg_steps,
+ linitial(argsteps));
+ }
+ else
+ {
+ /*
+ * No steps means the arg wasn't a clause matching
+ * this partition key. We cannot prune using such an
+ * arg. To indicate that to the pruning code, we must
+ * construct a PartitionPruneStepCombine and set the
+ * argsteps to an empty List. However, if we can
+ * prove using constraint exclusion that the clause
+ * refutes the table's partition constraint (if it's
+ * sub-partitioned), we need not bother with that.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStepCombine *orstep;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = makeNode(PartitionPruneStepCombine);
+ orstep->combineOp = COMBINE_OR;
+ orstep->argsteps = NIL;
+ all_arg_steps = lappend(all_arg_steps, orstep);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+ if (*constfalse)
+ return NIL;
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_OR;
+ combineStep->argsteps = all_arg_steps;
+ combineStep->argvalues = NIL;
+ result = lappend(result, combineStep);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ PartitionPruneStepCombine *combineStep;
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps;
+
+ argsteps = generate_partition_pruning_steps_internal(rel,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_AND;
+ combineStep->argsteps = argsteps;
+ combineStep->argvalues = NIL;
+ result = lappend(result, combineStep);
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ Assert(pc != NULL);
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ {
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+
+ generate_opsteps = true;
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* Nothing to do here. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /* go check for the next key. */
+ break;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /* Combine values from all <> operator clauses into one prune step. */
+ if (ne_clauses != NIL)
+ {
+ List *argvalues = NIL;
+ PartitionPruneStepCombine *combineStep;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+
+ argvalues = lappend(argvalues, pc->value);
+ }
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_NOT;
+ combineStep->argsteps = NIL;
+ combineStep->argvalues = argvalues;
+ result = lappend(result, combineStep);
+ }
+
+ /* There were nothing but combining steps in the clauses we got. */
+ if (!generate_opsteps)
+ return result;
+
+ /*
+ * Now we have one list of clauses per partition key. To be useful for
+ * pruning, we must have clauses for a prefix of partition keys in the
+ * case of range partitioning. For hash partitioning, if a column doesn't
+ * have necessary equality clause, there should be an IS NULL clause,
+ * otherwise pruning is not possible.
+ */
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ need_next_less = true,
+ need_next_eq = true,
+ need_next_greater = true;
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NIL;
+
+ if (clauselist == NIL &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ break;
+
+ if (!(need_next_less || need_next_eq || need_next_greater))
+ break;
+
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+ }
+
+ /*
+ * Generate actual steps for various operator strategies by generating
+ * tuples of values, possibly multiple per operator strategy.
+ *
+ * XXX - add more description
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each non-equality strategy, generate tuples of values
+ * such that each tuple's non-last values come from an
+ * equality clause.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ pc = lfirst(lc);
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ if (prefix == NIL && pc->keyno > 0)
+ continue;
+
+ /*
+ * Considering pc->value as the last value in the
+ * pruning tuple, try to generate pruning steps for
+ * tuples containing various combinations of values
+ * for earlier columns from the clauses in prefix.
+ */
+ pc_steps = get_steps_using_prefix(pc->op_strategy,
+ pc->value,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ PartClauseInfo *pc;
+ ListCell *lc1;
+
+ if (eq_clauses != NIL)
+ {
+ pc = llast(eq_clauses);
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+
+ prefix = lappend(prefix, pc);
+ }
+
+ for_each_cell(lc1, lc)
+ {
+ pc_steps = get_steps_using_prefix(pc->op_strategy,
+ pc->value,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /*
+ * Generate one prune step for the information derived from IS NULL and IS
+ * NOT NULL clauses. Note that for IS NOT NULL clauses, simply having
+ * step suffices; there is no need to propagate the exact details of which
+ * keys are required to be NOT NULL.
+ */
+ if (opsteps == NIL &&
+ (!bms_is_empty(nullkeys) || !bms_is_empty(notnullkeys)))
+ {
+ PartitionPruneStepOp *opstep;
+
+ opstep = makeNode(PartitionPruneStepOp);
+ opstep->nullkeys = nullkeys;
+ opsteps = lappend(opsteps, opstep);
+ }
+
+ /* Add opsteps to result. */
+ result = list_concat(result, opsteps);
+
+ return result;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * one of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match some other key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all
+ * even if it may have been matched with a key due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *value;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &value))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->value = value;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ value = rightop;
+ else if (equal(rightop, partkey))
+ {
+ value = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile value to prune partitions. */
+ if (contain_volatile_functions((Node *) value))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->value = value;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form: saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps = generate_partition_pruning_steps_internal(rel,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps = generate_partition_pruning_steps_internal(rel,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_steps_using_prefix
+ *
+ * XXX - add comment
+ */
+static List *
+get_steps_using_prefix(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix lastvalue with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStepOp *step = makeNode(PartitionPruneStepOp);
+
+ step->opstrategy = step_opstrategy;
+ step->values = list_make1(step_lastvalue);
+ step->nullkeys = step_nullkeys;
+
+ return list_make1(step);
+ }
+
+ return get_steps_using_prefix_recurse(step_opstrategy,
+ step_lastvalue,
+ step_lastkeyno,
+ step_nullkeys,
+ prefix,
+ list_head(prefix),
+ NIL);
+}
+
+static List *
+get_steps_using_prefix_recurse(int step_opstrategy,
+ Expr *step_lastvalue,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix,
+ ListCell *start,
+ List *step_values)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int step_keyno;
+
+ Assert(start != NULL);
+ step_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (step_keyno == step_lastkeyno - 1)
+ {
+ Assert(list_length(step_values) == step_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStepOp *step;
+ List *step_values1;
+
+ if (pc->keyno > step_keyno)
+ break;
+
+ step_values1 = list_copy(step_values);
+ step_values1 = lappend(step_values1, pc->value);
+ step_values1 = lappend(step_values1, step_lastvalue);
+
+ step = makeNode(PartitionPruneStepOp);
+ step->opstrategy = step_opstrategy;
+ step->values = step_values1;
+ step->nullkeys = step_nullkeys;
+ result = lappend(result, step);
+ }
+ }
+ else
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > step_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == 0)
+ {
+ list_free(step_values);
+ step_values = list_make1(pc->value);
+ }
+ else if (pc->keyno == step_keyno)
+ step_values = lappend(step_values, pc->value);
+ else
+ break;
+
+ result =
+ list_concat(result,
+ list_copy(get_steps_using_prefix_recurse(step_opstrategy,
+ step_lastvalue,
+ step_lastkeyno,
+ step_nullkeys,
+ prefix,
+ next_start,
+ step_values)));
+ }
+ }
+
+ return result;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 093ca5208e..7c1b0de295 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..fb29a66a64 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +95,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..046f252915 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -191,6 +191,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..5c157fb1f1 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,39 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*----------
+ * PartitionPruneStep - base type for nodes representing a partition pruning
+ * step
+ *----------
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+} PartitionPruneStep;
+
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *values;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_OR,
+ COMBINE_AND,
+ COMBINE_NOT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *argsteps;
+ List *argvalues;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f151646271..ed0a885370 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e2b90f3263..d75a23e4a6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -24,11 +24,13 @@ explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-(5 rows)
+(7 rows)
explain (costs off) select * from lp where a > 'a' and a <= 'd';
QUERY PLAN
@@ -208,16 +210,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +235,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +265,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +577,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +718,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +894,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +906,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -963,9 +967,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1007,24 +1013,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1036,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1098,11 +1089,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1110,13 +1103,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
-- pruning should work fine, because prefix of keys is available
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
@@ -1124,11 +1125,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1138,7 +1141,7 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p_default t2_2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-(16 rows)
+(18 rows)
-- pruning should work fine in this case, too.
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
@@ -1150,13 +1153,15 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-> Seq Scan on mc3p1 t2
Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
Filter: (a = 1)
-(12 rows)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
--
-- pruning with clauses containing <> operator
@@ -1271,22 +1276,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning with just both columns constrained
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1340,3 +1339,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 38b5f68658..86a3a3e7ce 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -237,3 +237,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 17bf55c1f5..71e86fc254 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1584,6 +1584,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1596,6 +1597,11 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepNoop
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
--
2.11.0
v39-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v39-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 20dcad309ee5b249ce2eecd17eef87222db7d39e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v39 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 94 ++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 106 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0629607cf4..0da34271a6 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2291,21 +2291,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5079,9 +5064,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 765b1be74b..164eff7363 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3187,9 +3177,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f61ae03ac5..9ce40ee3b3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2230,7 +2230,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2255,6 +2254,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2304,6 +2304,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2529,16 +2530,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4074,9 +4065,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f64a2bf090..7eeaed3133 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -876,6 +876,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
+ /*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
rel->baserestrictinfo != NIL)
{
@@ -1323,6 +1334,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1333,7 +1350,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1360,49 +1376,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1421,9 +1443,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 50f858e420..f733075527 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -610,7 +610,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -625,6 +624,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1167,12 +1167,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1244,10 +1244,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1478,6 +1480,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1578,6 +1584,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1585,7 +1606,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6112,65 +6133,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9d94..058fb24927 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 046f252915..7a14bbb10b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -263,7 +263,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ed0a885370..b4219b2d57 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2127,27 +2132,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 71e86fc254..6b8851509f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1608,7 +1608,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
Thanks for the review.
On 2018/03/21 6:29, Robert Haas wrote:
On Tue, Mar 20, 2018 at 7:07 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:On 2018/03/16 21:55, Amit Langote wrote:
Attached updated patches.
Attached is further revised version.
Of note is getting rid of PartitionPruneContext usage in the static
functions of partprune.c. Most of the code there ought to only run during
planning, so it can access the necessary information from RelOptInfo
directly instead of copying it to PartitionPruneContext and then passing
it around.+ if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { + if (rel->baserestrictinfo != NIL) + { + live_children = prune_append_rel_partitions(root, rel); + did_pruning = true; + } + }Use &&
Fixed in the latest version.
+ case COMBINE_OR:
+ {Won't survive pgindent, which currently produces a *massive* diff for
these patches.
That's gone in the latest patch.
Things for the overall patch should have improved in the latest version.
+ /* + * XXX- The following ad-hoc method of pruning only works for list + * partitioning. It checks for each partition if all of its + * accepted values appear in ne_datums[]. + */So why are we doing it this way? How about doing something not
ad-hoc? I tried to propose that before.
Hmm, perhaps I should have written a better comment.
What I really meant to say is that pruning using <> operators can be
implemented sanely only for list partitions. We can prune a given
partition with <> clauses, only if we can find a <> clause for *all*
values that the partition accepts. Doing so seems doable only for list
partitions where we require enumerating all values that the partition may
contain.
+ * Set *value to the constant value obtained by evaluating 'expr' + * + * Note that we may not be able to evaluate the input expression, in which + * case, the function returns false to indicate that *value has not been + * set. True is returned otherwise.These comments need updating, since this function (laudibly) no longer
does any evaluating. I wonder how this will work for run-time
pruning, though.
Fixed the comment.
Run-time pruning patch adds some code to this function to be able to
return values for Params for which, afaik, it also adds some state
information to the PartitionPruneContext argument.
+ if (context->partopcintype[partkeyidx] != exprTyp) + { + Oid new_supfuncid; + int16 procnum; + + + procnum = (context->strategy == PARTITION_STRATEGY_HASH) + ? HASHEXTENDED_PROC + : BTORDER_PROC; + new_supfuncid = get_opfamily_proc(context->partopfamily[partkeyidx], + context->partopcintype[partkeyidx], + exprTyp, procnum); + fmgr_info(new_supfuncid, &context->partsupfunc[partkeyidx]); + }What's the point of this, exactly? Leftover dead code, maybe?
Actually, this *was* an effort to teach the patch to use the correct
comparison function for comparison against partition bounds in case of
clause value being of different type.
After reading David's comment about this, I concluded that it's placed at
a wrong place, which is fixed in the latest patch. The comparison
functions are changed (if needed) in the function that would call
partkey_datum_from_expr, not in partkey_datum_from_expr itself.
+ * Input: + * See the comments above the definition of PartScanKeyInfo to see what + * kind of information is contained in 'keys'.There's no such thing as PartScanKeyInfo any more and the function has
no argument called 'keys'. None of the functions actual arguments are
explained.
Sorry, this should be gone in the latest patches.
+ /* + * If there are multiple pruning steps, we perform them one after another, + * passing the result of one step as input to another. Based on the type + * of pruning step, perform_pruning_step may add or remove partitions from + * the set of partitions it receives as the input. + */The comment sounds great, but the code doesn't work that way; it
always calls bms_int_members to intersect the new result with any
previous result. I'm baffled as to how this manages to DTRT if
COMBINE_OR is used. In general I had hoped that the list of pruning
steps was something over which we were only going to iterate, not
recurse. This definitely recurses for the combine steps, but it's
still (sorta) got the idea of a list of iterable steps. That's a
weird mix.
At the top-level (in get_matching_partitions), it is assumed that the
steps in the input list come from implicitly AND'd clauses, so the
intersection between partition sets that we get for each.
Anyway, after David's rewrite of this portion of the patch incorporated in
the latest patch, things look a bit different here, although there is
still recursion for combine steps. I'm still considering how to make the
recursion go away.
+ if (nvalues == context->partnatts) + { + greatest_modulus = get_greatest_modulus(boundinfo); + rowHash = compute_hash_value(partnatts, partsupfunc, values, + isnull); + result_index = partindices[rowHash % greatest_modulus]; + if (result_index >= 0) + return bms_make_singleton(result_index); + } + else + /* Can't do pruning otherwise, so return all partitions. */ + return bms_add_range(NULL, 0, context->nparts - 1);Wouldn't we want to (1) arrange things so that this function is never
called if nvalues < context->partnatts && context->strategy ==
PARTITION_STRATEGY_HASH or at least (2) avoid constructing isnull from
nullkeys if we're not going to use it?
We call this function even if nvalues < context->partnatts but we found IS
NULL clauses for *all* the remaining columns.
Although, given the checks that planner (partprune.c) performs, we should
get here only if pruning is possible, so the code to handle cases where
pruning couldn't occur is redundant.
Also, shouldn't we be sanity-checking the strategy number here?
That's right, fixed.
I'm out of time for right now but it looks to me like this patch still
needs quite a bit of fine-tuning.
I have posted an updated patch in reply to David's review.
Thanks,
Amit
On 24 March 2018 at 01:15, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
In the updated patch (that is, after incorporating your changes), I have
moved this partsupfunc switching to the caller of partkey_datum_from_expr
instead of doing it there. New patch also checks that returned function
OID is valid, which if not we don't use the expression's value for pruning.
Thanks for accepting those changes.
So now. we statically allocate a partsupfunc array on every invocation of
perform_pruning_base_step() or of get_partitions_excluded_by_ne_datums().
Considering run-time pruning, we may have to find some other place to
cache that.
hmm yeah, it's not perfect, but I don't have any better ideas for now,
apart from this probably could be done when creating the steps rather
than executing them. That would save having to look up the correct
function Oid during execution, and save bothering to create steps
values that we simply can't compare to the partition key.
I've done this in the attached patch against v39.
I also renamed argvalues to argexprs, since they're not values. The
PartClauseInfo could probably do with the same change too, but I
didn't touch it.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v39_drowley_delta1.patchapplication/octet-stream; name=v39_drowley_delta1.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 2df2530d5e3..e7809c8937a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1813,9 +1813,10 @@ perform_pruning_combine_step(PartitionPruneContext *context,
{
Bitmapset *stepresult;
Datum *ne_datums;
- int n_ne_datums = list_length(cstep->argvalues),
+ FmgrInfo **partsupfuncs;
+ ListCell *lc2;
+ int n_ne_datums = list_length(cstep->argexprs),
i;
- FmgrInfo **partsupfunc;
/*
* Apply not-equal clauses. This only applies in the list
@@ -1823,18 +1824,20 @@ perform_pruning_combine_step(PartitionPruneContext *context,
* we have knowledge of the entire set of values that can be
* stored in a given partition.
*/
- ne_datums = (Datum *) palloc0(n_ne_datums * sizeof(Datum));
+ ne_datums = (Datum *) palloc(n_ne_datums * sizeof(Datum));
/*
* Some datums may require different comparison function than
* the default partitioning-specific one.
*/
- partsupfunc = (FmgrInfo **)
- palloc0(n_ne_datums * sizeof(FmgrInfo *));
+ partsupfuncs = (FmgrInfo **)
+ palloc(n_ne_datums * sizeof(FmgrInfo *));
+
i = 0;
- foreach(lc, cstep->argvalues)
+ forboth(lc, cstep->argexprs, lc2, cstep->argcmpfuncoids)
{
- Expr *expr = lfirst(lc);
+ Expr *expr = (Expr *) lfirst(lc);
+ Oid cmpfuncoid = lfirst_oid(lc2);
Datum datum;
/*
@@ -1843,38 +1846,28 @@ perform_pruning_combine_step(PartitionPruneContext *context,
*/
if (partkey_datum_from_expr(context, expr, &datum))
{
- Oid exprTyp = exprType((Node *) expr);
-
/*
- * Check if we need to use a different comparison
- * function for this value.
+ * If this datum is not the same type as the partition
+ * key then we'll need to use the comparison function
+ * for that type. We'll need to lookup the FmgrInfo.
*/
- if (context->partopcintype[0] != exprTyp)
+ if (cmpfuncoid != context->partsupfunc[0].fn_oid)
{
- Oid cmpfn;
-
- cmpfn = get_opfamily_proc(context->partopfamily[0],
- context->partopcintype[0],
- exprTyp, BTORDER_PROC);
- if (OidIsValid(cmpfn))
- {
- partsupfunc[i] = palloc0(sizeof(FmgrInfo));
- fmgr_info(cmpfn, partsupfunc[i]);
- }
- else /* Can't really use datum for pruning. */
- continue;
+ partsupfuncs[i] = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo));
+ fmgr_info(cmpfuncoid, partsupfuncs[i]);
}
else
- partsupfunc[i] = &context->partsupfunc[0];
-
- ne_datums[i++] = datum;
+ partsupfuncs[i] = &context->partsupfunc[0];
}
+
+ ne_datums[i++] = datum;
}
stepresult = get_partitions_excluded_by_ne_datums(context,
ne_datums,
i,
- partsupfunc);
+ partsupfuncs);
/* All partitions apart from the stepresult partitions match */
result = bms_add_range(NULL, 0, context->nparts - 1);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0da34271a6a..adc24b44394 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2158,7 +2158,8 @@ _copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
COPY_SCALAR_FIELD(combineOp);
COPY_NODE_FIELD(argsteps);
- COPY_NODE_FIELD(argvalues);
+ COPY_NODE_FIELD(argexprs);
+ COPY_NODE_FIELD(argcmpfuncoids);
return newnode;
}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 0a3e32ecd12..bec265c8896 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2160,7 +2160,9 @@ expression_tree_walker(Node *node,
if (walker((Node *) cstep->argsteps, context))
return true;
- if (walker((Node *) cstep->argvalues, context))
+ if (walker((Node *) cstep->argexprs, context))
+ return true;
+ if (walker((Node *) cstep->argcmpfuncoids, context))
return true;
}
break;
@@ -2968,7 +2970,8 @@ expression_tree_mutator(Node *node,
FLATCOPY(newnode, cstep, PartitionPruneStepCombine);
MUTATE(newnode->argsteps, cstep->argsteps, List *);
- MUTATE(newnode->argvalues, cstep->argvalues, List *);
+ MUTATE(newnode->argexprs, cstep->argexprs, List *);
+ MUTATE(newnode->argcmpfuncoids, cstep->argcmpfuncoids, List *);
return (Node *) newnode;
}
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index 6e7bec429be..39b639c7b94 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -17,6 +17,7 @@
#include "postgres.h"
#include "access/hash.h"
+#include "access/nbtree.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_opfamily.h"
#include "catalog/pg_type.h"
@@ -41,6 +42,8 @@ typedef struct PartClauseInfo
Oid opno; /* operator used to compare partkey to 'value' */
Expr *value; /* The value the partition key is being
* compared to */
+ Oid cmpfuncoid; /* Oid of function to compare this to the
+ * partition key */
/* cached info. */
int op_strategy;
@@ -321,7 +324,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
combineStep = makeNode(PartitionPruneStepCombine);
combineStep->combineOp = COMBINE_OR;
combineStep->argsteps = all_arg_steps;
- combineStep->argvalues = NIL;
+ combineStep->argexprs = NIL;
+ combineStep->argcmpfuncoids = NIL;
result = lappend(result, combineStep);
continue;
}
@@ -340,7 +344,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
combineStep = makeNode(PartitionPruneStepCombine);
combineStep->combineOp = COMBINE_AND;
combineStep->argsteps = argsteps;
- combineStep->argvalues = NIL;
+ combineStep->argexprs = NIL;
+ combineStep->argcmpfuncoids = NIL;
result = lappend(result, combineStep);
continue;
}
@@ -449,7 +454,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
/* Combine values from all <> operator clauses into one prune step. */
if (ne_clauses != NIL)
{
- List *argvalues = NIL;
+ List *argexprs = NIL;
+ List *argcmpfuncs = NIL;
PartitionPruneStepCombine *combineStep;
Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
@@ -457,13 +463,15 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
{
PartClauseInfo *pc = lfirst(lc);
- argvalues = lappend(argvalues, pc->value);
+ argexprs = lappend(argexprs, pc->value);
+ argcmpfuncs = lappend_oid(argcmpfuncs, pc->cmpfuncoid);
}
combineStep = makeNode(PartitionPruneStepCombine);
combineStep->combineOp = COMBINE_NOT;
combineStep->argsteps = NIL;
- combineStep->argvalues = argvalues;
+ combineStep->argexprs = argexprs;
+ combineStep->argcmpfuncoids = argcmpfuncs;
result = lappend(result, combineStep);
}
@@ -802,16 +810,19 @@ match_clause_to_partition_key(RelOptInfo *rel,
/* Do pruning with the Boolean equality operator. */
(*pc)->opno = BooleanEqualOperator;
(*pc)->value = value;
+ (*pc)->cmpfuncoid = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
return PARTCLAUSE_MATCH_CLAUSE;
}
- else if (IsA(clause, OpExpr) &&list_length(((OpExpr *) clause)->args) == 2)
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
{
OpExpr *opclause = (OpExpr *) clause;
Expr *leftop,
*rightop;
Oid commutator = InvalidOid,
negator = InvalidOid;
+ Oid cmpfuncoid;
leftop = (Expr *) get_leftop(clause);
if (IsA(leftop, RelabelType))
@@ -898,6 +909,20 @@ match_clause_to_partition_key(RelOptInfo *rel,
return PARTCLAUSE_UNSUPPORTED;
}
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH)
+ cmpfuncoid = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprType((Node *) value),
+ HASHEXTENDED_PROC);
+ else
+ cmpfuncoid = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprType((Node *) value),
+ BTORDER_PROC);
+
+ if (!OidIsValid(cmpfuncoid))
+ return PARTCLAUSE_UNSUPPORTED;
+
*pc = palloc0(sizeof(PartClauseInfo));
(*pc)->keyno = partkeyidx;
@@ -914,6 +939,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
(*pc)->opno = opclause->opno;
(*pc)->value = value;
+ (*pc)->cmpfuncoid = cmpfuncoid;
return PARTCLAUSE_MATCH_CLAUSE;
}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 5c157fb1f1e..507cc87af1c 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1538,7 +1538,9 @@ typedef struct PartitionPruneStepCombine
PartitionPruneCombineOp combineOp;
List *argsteps;
- List *argvalues;
+ List *argexprs; /* Expressions to compare to the partition key */
+ List *argcmpfuncoids; /* Comparison function Oid used to compare the
+ * argexprs to the partition key */
} PartitionPruneStepCombine;
#endif /* PRIMNODES_H */
On 24 March 2018 at 16:42, David Rowley <david.rowley@2ndquadrant.com> wrote:
I've done this in the attached patch against v39.
I also renamed argvalues to argexprs, since they're not values. The
PartClauseInfo could probably do with the same change too, but I
didn't touch it.
The attached goes a little further and does a bit more renaming. I
don't think "values" is a good name for a list of Exprs. I'd expect
that might be a better-suited name for an array of Datums.
I've also added and modified a few comments. More comments are still
required. The Step structs are mostly undocumented still, but I'm
trying to understand how all this fits together still, at least well
enough to write about it.
The attached delta applies on top of v39 plus delta1.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v39_drowley_delta2.patchapplication/octet-stream; name=v39_drowley_delta2.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 2df2530d5e3..e7809c8937a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1813,9 +1813,10 @@ perform_pruning_combine_step(PartitionPruneContext *context,
{
Bitmapset *stepresult;
Datum *ne_datums;
- int n_ne_datums = list_length(cstep->argvalues),
+ FmgrInfo **partsupfuncs;
+ ListCell *lc2;
+ int n_ne_datums = list_length(cstep->argexprs),
i;
- FmgrInfo **partsupfunc;
/*
* Apply not-equal clauses. This only applies in the list
@@ -1823,18 +1824,20 @@ perform_pruning_combine_step(PartitionPruneContext *context,
* we have knowledge of the entire set of values that can be
* stored in a given partition.
*/
- ne_datums = (Datum *) palloc0(n_ne_datums * sizeof(Datum));
+ ne_datums = (Datum *) palloc(n_ne_datums * sizeof(Datum));
/*
* Some datums may require different comparison function than
* the default partitioning-specific one.
*/
- partsupfunc = (FmgrInfo **)
- palloc0(n_ne_datums * sizeof(FmgrInfo *));
+ partsupfuncs = (FmgrInfo **)
+ palloc(n_ne_datums * sizeof(FmgrInfo *));
+
i = 0;
- foreach(lc, cstep->argvalues)
+ forboth(lc, cstep->argexprs, lc2, cstep->argcmpfuncoids)
{
- Expr *expr = lfirst(lc);
+ Expr *expr = (Expr *) lfirst(lc);
+ Oid cmpfuncoid = lfirst_oid(lc2);
Datum datum;
/*
@@ -1843,38 +1846,28 @@ perform_pruning_combine_step(PartitionPruneContext *context,
*/
if (partkey_datum_from_expr(context, expr, &datum))
{
- Oid exprTyp = exprType((Node *) expr);
-
/*
- * Check if we need to use a different comparison
- * function for this value.
+ * If this datum is not the same type as the partition
+ * key then we'll need to use the comparison function
+ * for that type. We'll need to lookup the FmgrInfo.
*/
- if (context->partopcintype[0] != exprTyp)
+ if (cmpfuncoid != context->partsupfunc[0].fn_oid)
{
- Oid cmpfn;
-
- cmpfn = get_opfamily_proc(context->partopfamily[0],
- context->partopcintype[0],
- exprTyp, BTORDER_PROC);
- if (OidIsValid(cmpfn))
- {
- partsupfunc[i] = palloc0(sizeof(FmgrInfo));
- fmgr_info(cmpfn, partsupfunc[i]);
- }
- else /* Can't really use datum for pruning. */
- continue;
+ partsupfuncs[i] = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo));
+ fmgr_info(cmpfuncoid, partsupfuncs[i]);
}
else
- partsupfunc[i] = &context->partsupfunc[0];
-
- ne_datums[i++] = datum;
+ partsupfuncs[i] = &context->partsupfunc[0];
}
+
+ ne_datums[i++] = datum;
}
stepresult = get_partitions_excluded_by_ne_datums(context,
ne_datums,
i,
- partsupfunc);
+ partsupfuncs);
/* All partitions apart from the stepresult partitions match */
result = bms_add_range(NULL, 0, context->nparts - 1);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0da34271a6a..adc24b44394 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2158,7 +2158,8 @@ _copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
COPY_SCALAR_FIELD(combineOp);
COPY_NODE_FIELD(argsteps);
- COPY_NODE_FIELD(argvalues);
+ COPY_NODE_FIELD(argexprs);
+ COPY_NODE_FIELD(argcmpfuncoids);
return newnode;
}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 0a3e32ecd12..bec265c8896 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2160,7 +2160,9 @@ expression_tree_walker(Node *node,
if (walker((Node *) cstep->argsteps, context))
return true;
- if (walker((Node *) cstep->argvalues, context))
+ if (walker((Node *) cstep->argexprs, context))
+ return true;
+ if (walker((Node *) cstep->argcmpfuncoids, context))
return true;
}
break;
@@ -2968,7 +2970,8 @@ expression_tree_mutator(Node *node,
FLATCOPY(newnode, cstep, PartitionPruneStepCombine);
MUTATE(newnode->argsteps, cstep->argsteps, List *);
- MUTATE(newnode->argvalues, cstep->argvalues, List *);
+ MUTATE(newnode->argexprs, cstep->argexprs, List *);
+ MUTATE(newnode->argcmpfuncoids, cstep->argcmpfuncoids, List *);
return (Node *) newnode;
}
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index 6e7bec429be..39b639c7b94 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -17,6 +17,7 @@
#include "postgres.h"
#include "access/hash.h"
+#include "access/nbtree.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_opfamily.h"
#include "catalog/pg_type.h"
@@ -41,6 +42,8 @@ typedef struct PartClauseInfo
Oid opno; /* operator used to compare partkey to 'value' */
Expr *value; /* The value the partition key is being
* compared to */
+ Oid cmpfuncoid; /* Oid of function to compare this to the
+ * partition key */
/* cached info. */
int op_strategy;
@@ -321,7 +324,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
combineStep = makeNode(PartitionPruneStepCombine);
combineStep->combineOp = COMBINE_OR;
combineStep->argsteps = all_arg_steps;
- combineStep->argvalues = NIL;
+ combineStep->argexprs = NIL;
+ combineStep->argcmpfuncoids = NIL;
result = lappend(result, combineStep);
continue;
}
@@ -340,7 +344,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
combineStep = makeNode(PartitionPruneStepCombine);
combineStep->combineOp = COMBINE_AND;
combineStep->argsteps = argsteps;
- combineStep->argvalues = NIL;
+ combineStep->argexprs = NIL;
+ combineStep->argcmpfuncoids = NIL;
result = lappend(result, combineStep);
continue;
}
@@ -449,7 +454,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
/* Combine values from all <> operator clauses into one prune step. */
if (ne_clauses != NIL)
{
- List *argvalues = NIL;
+ List *argexprs = NIL;
+ List *argcmpfuncs = NIL;
PartitionPruneStepCombine *combineStep;
Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
@@ -457,13 +463,15 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
{
PartClauseInfo *pc = lfirst(lc);
- argvalues = lappend(argvalues, pc->value);
+ argexprs = lappend(argexprs, pc->value);
+ argcmpfuncs = lappend_oid(argcmpfuncs, pc->cmpfuncoid);
}
combineStep = makeNode(PartitionPruneStepCombine);
combineStep->combineOp = COMBINE_NOT;
combineStep->argsteps = NIL;
- combineStep->argvalues = argvalues;
+ combineStep->argexprs = argexprs;
+ combineStep->argcmpfuncoids = argcmpfuncs;
result = lappend(result, combineStep);
}
@@ -802,16 +810,19 @@ match_clause_to_partition_key(RelOptInfo *rel,
/* Do pruning with the Boolean equality operator. */
(*pc)->opno = BooleanEqualOperator;
(*pc)->value = value;
+ (*pc)->cmpfuncoid = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
return PARTCLAUSE_MATCH_CLAUSE;
}
- else if (IsA(clause, OpExpr) &&list_length(((OpExpr *) clause)->args) == 2)
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
{
OpExpr *opclause = (OpExpr *) clause;
Expr *leftop,
*rightop;
Oid commutator = InvalidOid,
negator = InvalidOid;
+ Oid cmpfuncoid;
leftop = (Expr *) get_leftop(clause);
if (IsA(leftop, RelabelType))
@@ -898,6 +909,20 @@ match_clause_to_partition_key(RelOptInfo *rel,
return PARTCLAUSE_UNSUPPORTED;
}
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH)
+ cmpfuncoid = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprType((Node *) value),
+ HASHEXTENDED_PROC);
+ else
+ cmpfuncoid = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprType((Node *) value),
+ BTORDER_PROC);
+
+ if (!OidIsValid(cmpfuncoid))
+ return PARTCLAUSE_UNSUPPORTED;
+
*pc = palloc0(sizeof(PartClauseInfo));
(*pc)->keyno = partkeyidx;
@@ -914,6 +939,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
(*pc)->opno = opclause->opno;
(*pc)->value = value;
+ (*pc)->cmpfuncoid = cmpfuncoid;
return PARTCLAUSE_MATCH_CLAUSE;
}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 5c157fb1f1e..507cc87af1c 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1538,7 +1538,9 @@ typedef struct PartitionPruneStepCombine
PartitionPruneCombineOp combineOp;
List *argsteps;
- List *argvalues;
+ List *argexprs; /* Expressions to compare to the partition key */
+ List *argcmpfuncoids; /* Comparison function Oid used to compare the
+ * argexprs to the partition key */
} PartitionPruneStepCombine;
#endif /* PRIMNODES_H */
On 25 March 2018 at 18:28, David Rowley <david.rowley@2ndquadrant.com> wrote:
The attached delta applies on top of v39 plus delta1.
Sorry, the attached should do this. Ignore the last attachment.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v39_drowley_delta2_1.patchapplication/octet-stream; name=v39_drowley_delta2_1.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e7809c8937a..b40e955d9e2 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1662,7 +1662,7 @@ perform_pruning_base_step(PartitionPruneContext *context,
FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
nvalues = 0;
- lc = list_head(opstep->values);
+ lc = list_head(opstep->exprs);
/* Generate the partition look-up key. */
for (keyno = 0; keyno < context->partnatts; keyno++)
@@ -1875,7 +1875,7 @@ perform_pruning_combine_step(PartitionPruneContext *context,
}
default:
- elog(ERROR, "Invalid PartitionPruneCombineOp: %d", (int)
+ elog(ERROR, "invalid PartitionPruneCombineOp: %d", (int)
cstep->combineOp);
return NULL; /* keep compiler quiet */
}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index adc24b44394..0f2e1d57dfd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2142,7 +2142,7 @@ _copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
COPY_SCALAR_FIELD(opstrategy);
- COPY_NODE_FIELD(values);
+ COPY_NODE_FIELD(exprs);
COPY_BITMAPSET_FIELD(nullkeys);
return newnode;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index bec265c8896..dab62eb1583 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2150,7 +2150,7 @@ expression_tree_walker(Node *node,
{
PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
- if (walker((Node *) opstep->values, context))
+ if (walker((Node *) opstep->exprs, context))
return true;
}
break;
@@ -2958,7 +2958,7 @@ expression_tree_mutator(Node *node,
PartitionPruneStepOp *newnode;
FLATCOPY(newnode, opstep, PartitionPruneStepOp);
- MUTATE(newnode->values, opstep->values, List *);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
return (Node *) newnode;
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 7eeaed31335..03b94f65932 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -887,6 +887,13 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
rel->partitioned_child_rels = list_make1_int(rti);
+ /*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
rel->baserestrictinfo != NIL)
{
@@ -1142,12 +1149,12 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
- if (IS_PARTITIONED_REL(rel) && did_pruning &&
+ if (did_pruning &&
!bms_is_member(appinfo->child_relid, live_children))
{
/*
- * This child need not be scanned, so we can omit it from the
- * appendrel.
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
*/
set_dummy_rel_pathlist(childrel);
continue;
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index 39b639c7b94..3fbc6c51de2 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -39,10 +39,10 @@
typedef struct PartClauseInfo
{
int keyno; /* Partition key number (0 to partnatts - 1) */
- Oid opno; /* operator used to compare partkey to 'value' */
- Expr *value; /* The value the partition key is being
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ Expr *expr; /* The expr the partition key is being
* compared to */
- Oid cmpfuncoid; /* Oid of function to compare this to the
+ Oid cmpfuncoid; /* Oid of function to compare 'expr' to the
* partition key */
/* cached info. */
@@ -70,25 +70,24 @@ static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
Expr *partkey, Expr **rightop);
static List *get_steps_using_prefix(int step_opstrategy,
- Expr *step_lastvalue,
+ Expr *step_lastexpr,
int step_lastkeyno,
Bitmapset *step_nullkeys,
List *prefix);
static List *get_steps_using_prefix_recurse(int step_opstrategy,
- Expr *step_lastvalue,
+ Expr *step_lastexpr,
int step_lastkeyno,
Bitmapset *step_nullkeys,
List *prefix,
ListCell *start,
- List *step_values);
+ List *step_exprs);
/*
* prune_append_rel_partitions
- * Returns RT indexes of relations belonging to the minimum set of
- * partitions which must be scanned to satisfy rel's baserestrictinfo
- * quals or NULL if no partitions exist.
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
*
- * Only call this if 'rel' corresponds to a partitioned table.
+ * Callers must ensure that 'rel' is a partitioned table.
*/
Relids
prune_append_rel_partitions(RelOptInfo *rel)
@@ -318,6 +317,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
}
*constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
if (*constfalse)
return NIL;
@@ -427,13 +428,16 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
break;
case PARTCLAUSE_MATCH_CONTRADICT:
- /* Nothing to do here. */
+ /* We've nothing more to do if a contradiction was found. */
*constfalse = true;
return NIL;
case PARTCLAUSE_NOMATCH:
- /* go check for the next key. */
- break;
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
case PARTCLAUSE_UNSUPPORTED:
/* This clause cannot be used for pruning. */
@@ -451,7 +455,9 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
}
}
- /* Combine values from all <> operator clauses into one prune step. */
+ /*
+ * Combine expressions from all <> operator clauses into one prune step.
+ */
if (ne_clauses != NIL)
{
List *argexprs = NIL;
@@ -463,7 +469,7 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
{
PartClauseInfo *pc = lfirst(lc);
- argexprs = lappend(argexprs, pc->value);
+ argexprs = lappend(argexprs, pc->expr);
argcmpfuncs = lappend_oid(argcmpfuncs, pc->cmpfuncoid);
}
@@ -659,13 +665,13 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
continue;
/*
- * Considering pc->value as the last value in the
+ * Considering pc->expr as the last value in the
* pruning tuple, try to generate pruning steps for
* tuples containing various combinations of values
* for earlier columns from the clauses in prefix.
*/
pc_steps = get_steps_using_prefix(pc->op_strategy,
- pc->value,
+ pc->expr,
pc->keyno,
NULL,
prefix);
@@ -700,7 +706,7 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
for_each_cell(lc1, lc)
{
pc_steps = get_steps_using_prefix(pc->op_strategy,
- pc->value,
+ pc->expr,
pc->keyno,
nullkeys,
prefix);
@@ -794,7 +800,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
bool *is_neop_listp)
{
PartitionScheme part_scheme = rel->part_scheme;
- Expr *value;
+ Expr *expr;
Oid partopfamily = part_scheme->partopfamily[partkeyidx],
partcoll = part_scheme->partcollation[partkeyidx];
@@ -802,14 +808,14 @@ match_clause_to_partition_key(RelOptInfo *rel,
* Recognize specially shaped clauses that match with the Boolean
* partition key.
*/
- if (match_boolean_partition_clause(partopfamily, clause, partkey, &value))
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
{
*is_neop_listp = false;
*pc = palloc0(sizeof(PartClauseInfo));
(*pc)->keyno = partkeyidx;
/* Do pruning with the Boolean equality operator. */
(*pc)->opno = BooleanEqualOperator;
- (*pc)->value = value;
+ (*pc)->expr = expr;
(*pc)->cmpfuncoid = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
return PARTCLAUSE_MATCH_CLAUSE;
@@ -833,10 +839,10 @@ match_clause_to_partition_key(RelOptInfo *rel,
/* check if the clause matches this partition key */
if (equal(leftop, partkey))
- value = rightop;
+ expr = rightop;
else if (equal(rightop, partkey))
{
- value = leftop;
+ expr = leftop;
commutator = get_commutator(opclause->opno);
/* nothing we can do unless we can swap the operands */
@@ -869,8 +875,8 @@ match_clause_to_partition_key(RelOptInfo *rel,
if (!op_strict(opclause->opno))
return PARTCLAUSE_UNSUPPORTED;
- /* We can't use any volatile value to prune partitions. */
- if (contain_volatile_functions((Node *) value))
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
return PARTCLAUSE_UNSUPPORTED;
/*
@@ -912,12 +918,12 @@ match_clause_to_partition_key(RelOptInfo *rel,
if (part_scheme->strategy == PARTITION_STRATEGY_HASH)
cmpfuncoid = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
part_scheme->partopcintype[partkeyidx],
- exprType((Node *) value),
+ exprType((Node *) expr),
HASHEXTENDED_PROC);
else
cmpfuncoid = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
part_scheme->partopcintype[partkeyidx],
- exprType((Node *) value),
+ exprType((Node *) expr),
BTORDER_PROC);
if (!OidIsValid(cmpfuncoid))
@@ -938,7 +944,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
else
(*pc)->opno = opclause->opno;
- (*pc)->value = value;
+ (*pc)->expr = expr;
(*pc)->cmpfuncoid = cmpfuncoid;
return PARTCLAUSE_MATCH_CLAUSE;
@@ -1201,25 +1207,25 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
*/
static List *
get_steps_using_prefix(int step_opstrategy,
- Expr *step_lastvalue,
+ Expr *step_lastexpr,
int step_lastkeyno,
Bitmapset *step_nullkeys,
List *prefix)
{
- /* Quick exit if there are no values to prefix lastvalue with. */
+ /* Quick exit if there are no values to prefix lastexpr with. */
if (list_length(prefix) == 0)
{
PartitionPruneStepOp *step = makeNode(PartitionPruneStepOp);
step->opstrategy = step_opstrategy;
- step->values = list_make1(step_lastvalue);
+ step->exprs = list_make1(step_lastexpr);
step->nullkeys = step_nullkeys;
return list_make1(step);
}
return get_steps_using_prefix_recurse(step_opstrategy,
- step_lastvalue,
+ step_lastexpr,
step_lastkeyno,
step_nullkeys,
prefix,
@@ -1229,12 +1235,12 @@ get_steps_using_prefix(int step_opstrategy,
static List *
get_steps_using_prefix_recurse(int step_opstrategy,
- Expr *step_lastvalue,
+ Expr *step_lastexpr,
int step_lastkeyno,
Bitmapset *step_nullkeys,
List *prefix,
ListCell *start,
- List *step_values)
+ List *step_exprs)
{
List *result = NIL;
ListCell *lc;
@@ -1244,23 +1250,23 @@ get_steps_using_prefix_recurse(int step_opstrategy,
step_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
if (step_keyno == step_lastkeyno - 1)
{
- Assert(list_length(step_values) == step_keyno);
+ Assert(list_length(step_exprs) == step_keyno);
for_each_cell(lc, start)
{
PartClauseInfo *pc = lfirst(lc);
PartitionPruneStepOp *step;
- List *step_values1;
+ List *step_exprs1;
if (pc->keyno > step_keyno)
break;
- step_values1 = list_copy(step_values);
- step_values1 = lappend(step_values1, pc->value);
- step_values1 = lappend(step_values1, step_lastvalue);
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
step = makeNode(PartitionPruneStepOp);
step->opstrategy = step_opstrategy;
- step->values = step_values1;
+ step->exprs = step_exprs1;
step->nullkeys = step_nullkeys;
result = lappend(result, step);
}
@@ -1284,23 +1290,23 @@ get_steps_using_prefix_recurse(int step_opstrategy,
pc = lfirst(lc);
if (pc->keyno == 0)
{
- list_free(step_values);
- step_values = list_make1(pc->value);
+ list_free(step_exprs);
+ step_exprs = list_make1(pc->expr);
}
else if (pc->keyno == step_keyno)
- step_values = lappend(step_values, pc->value);
+ step_exprs = lappend(step_exprs, pc->expr);
else
break;
result =
list_concat(result,
list_copy(get_steps_using_prefix_recurse(step_opstrategy,
- step_lastvalue,
+ step_lastexpr,
step_lastkeyno,
step_nullkeys,
prefix,
next_start,
- step_values)));
+ step_exprs)));
}
}
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 507cc87af1c..301d1f3bc03 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1521,7 +1521,7 @@ typedef struct PartitionPruneStepOp
PartitionPruneStep step;
int opstrategy;
- List *values;
+ List *exprs;
Bitmapset *nullkeys;
} PartitionPruneStepOp;
@@ -1530,7 +1530,7 @@ typedef enum PartitionPruneCombineOp
COMBINE_OR,
COMBINE_AND,
COMBINE_NOT
-} PartitionPruneCombineOp;
+} PartitionPruneCombineOp;
typedef struct PartitionPruneStepCombine
{
Hi David.
On 2018/03/25 14:32, David Rowley wrote:
On 25 March 2018 at 18:28, David Rowley <david.rowley@2ndquadrant.com> wrote:
The attached delta applies on top of v39 plus delta1.
Sorry, the attached should do this. Ignore the last attachment.
I have incorporated both of your delta1 and delta2_1 patches.
Your proposed change to determine the cross-type comparison function OID
during planning itself is a good one, although I wasn't sure why it was
done only for the <> operators. I also implemented that for
PartitionPruneStepOp steps.
Also, I started thinking that implementing pruning using <> operators with
a PartitionPruneCombineOp was not such a great idea. That needed us to
add argexprs and argcmpfns to that struct, which seemed a bit odd. I
defined a new pruning node type called PartitionPruneStepOpNe, which still
seems a bit odd, but given that our support for pruning using <> is quite
specialized, that may be fine.
I added a bunch of hopefully informative comments in partprune.c and for
the struct definitions of pruning step nodes.
Please find attached find a new version.
Thanks,
Amit
Attachments:
v40-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v40-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From d494978cf0f952a88a9942aa8584f42225f6a43d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v40 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index bd3a0c4a0a..093ca5208e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..f151646271 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v40-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v40-0002-Add-more-tests-for-partition-pruning.patchDownload
From d8dd478d9773505f6b0836781ecbd1a3979eace3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v40 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 255 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 86 ++++++++-
2 files changed, 339 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..e2b90f3263 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,257 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..38b5f68658 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,88 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v40-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v40-0003-Faster-partition-pruning.patchDownload
From ae3fdea538b01414e23225b0ef356582b062e83e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v40 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1020 +++++++++++++++++
src/backend/nodes/copyfuncs.c | 52 +
src/backend/nodes/nodeFuncs.c | 61 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1471 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/primnodes.h | 92 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 318 ++++--
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 6 +
18 files changed, 3118 insertions(+), 86 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 53855f5088..0e47d9a0de 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -193,6 +193,30 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *perform_pruning_step(PartitionPruneContext *context,
+ PartitionPruneStep *step);
+static Bitmapset *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static Bitmapset *perform_pruning_base_step_ne(PartitionPruneContext *context,
+ PartitionPruneStepOpNe *nestep);
+static Bitmapset *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_partitions_for_keys_range(
+ PartitionPruneContext *context, int opstrategy,
+ Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums,
+ FmgrInfo **partsupfunc);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1553,9 +1577,1005 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ /* If there are no pruning steps then all partitions match. */
+ if (pruning_steps == NIL)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ else
+ {
+ Bitmapset *result = NULL;
+ ListCell *lc;
+ bool firststep = true;
+
+ /*
+ * Below we process each partition pruning step one by one. With each
+ * step we the intersect the result with the previously taken steps so
+ * that we end up with a minimal set of matching partition indexes.
+ * When performing the first step, we take the entire result, so we've
+ * something to intersect on subsequent steps.
+ */
+
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *stepresult;
+
+ stepresult = perform_pruning_step(context, step);
+
+ if (firststep)
+ {
+ result = stepresult;
+ firststep = false;
+ }
+ else
+ result = bms_int_members(result, stepresult);
+ }
+ return result;
+ }
+}
+
/* Module-local functions */
/*
+ * perform_pruning_step
+ * Performs one PartitionPruneStep
+ */
+static Bitmapset *
+perform_pruning_step(PartitionPruneContext *context,
+ PartitionPruneStep *step)
+{
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ return perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+
+ case T_PartitionPruneStepOpNe:
+ return perform_pruning_base_step_ne(context,
+ (PartitionPruneStepOpNe *) step);
+
+ case T_PartitionPruneStepCombine:
+ return perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step);
+
+ default:
+ elog(ERROR, "invalid partition pruning step: %d", nodeTag(step));
+ return NULL; /* keep compiler quiet */
+ }
+}
+
+/*
+ * perform_pruning_base_step
+ * Returns indexes of partitions obtained by executing 'opstep'.
+ */
+static Bitmapset *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ if (lc2 == NULL)
+ elog(ERROR, "incomplete cmpfnids list in pruning step");
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ partsupfunc[keyno] = context->partsupfunc[keyno];
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ return NULL;
+ }
+}
+
+/*
+ * perform_pruning_base_step_ne
+ * Returns indexes of partitions obtained by executing 'nestep'.
+ */
+static Bitmapset *
+perform_pruning_base_step_ne(PartitionPruneContext *context,
+ PartitionPruneStepOpNe *nestep)
+{
+ Bitmapset *excluded_parts,
+ *result;
+ Datum *ne_datums;
+ FmgrInfo **partsupfunc;
+ ListCell *lc1;
+ ListCell *lc2;
+ int n_ne_datums = list_length(nestep->exprs),
+ i;
+
+ /*
+ * Apply not-equal clauses. This only applies in the list
+ * partitioning case as this is the only partition type where
+ * we have knowledge of the entire set of values that can be
+ * stored in a given partition.
+ */
+ ne_datums = (Datum *) palloc(n_ne_datums * sizeof(Datum));
+
+ /*
+ * Some datums may require different comparison function than
+ * the default partitioning-specific one.
+ */
+ partsupfunc = (FmgrInfo **)
+ palloc(n_ne_datums * sizeof(FmgrInfo *));
+
+ i = 0;
+ forboth(lc1, nestep->exprs, lc2, nestep->cmpfns)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ Oid cmpfn = lfirst_oid(lc2);
+ Datum datum;
+
+ /*
+ * Note that we're passing 0 for partkeyidx, because there
+ * can be only one partition key with list partitioning.
+ */
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ /*
+ * If this datum is not the same type as the partition
+ * key then we'll need to use the comparison function
+ * for that type. We'll need to lookup the FmgrInfo.
+ */
+ if (cmpfn != context->partsupfunc[0].fn_oid)
+ {
+ partsupfunc[i] = (FmgrInfo *) palloc(sizeof(FmgrInfo));
+ fmgr_info(cmpfn, partsupfunc[i]);
+ }
+ else
+ partsupfunc[i] = &context->partsupfunc[0];
+ }
+
+ ne_datums[i++] = datum;
+ }
+
+ excluded_parts = get_partitions_excluded_by_ne_datums(context, ne_datums, i,
+ partsupfunc);
+ pfree(ne_datums);
+
+ /* All partitions apart from those in excluded_parts match */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ return bms_del_members(result, excluded_parts);
+}
+
+
+
+/*
+ * perform_pruning_combine_step
+ * Returns indexes of partitions obtained by executing 'cstep'.
+ */
+static Bitmapset *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep)
+{
+ Bitmapset *result;
+ ListCell *lc;
+
+ /*
+ * In some cases, planner generates a combine step that doesn't contain
+ * any argument steps, to signal us to not prune any partitions. So,
+ * return all partitions in that case.
+ */
+ if (cstep->argsteps == NIL)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_OR:
+ {
+ result = NULL;
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argresult;
+
+ /*
+ * We need to recurse here to account for the fact that
+ * the argument's pruning step could arbitrarily be of
+ * any type.
+ */
+ argresult = perform_pruning_step(context, step);
+
+ /* Add argresult to result. */
+ result = bms_add_members(result, argresult);
+ }
+
+ return result;
+ }
+
+ case COMBINE_AND:
+ {
+ bool firststep;
+
+ firststep = true;
+ result = NULL;
+ foreach(lc, cstep->argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+ Bitmapset *argresult;
+
+ argresult = perform_pruning_step(context, step);
+
+ if (firststep)
+ {
+ result = argresult;
+ firststep = false;
+ }
+ else
+ result = bms_int_members(result, argresult);
+ }
+
+ return result;
+ }
+
+ default:
+ elog(ERROR, "invalid PartitionPruneCombineOp: %d", (int)
+ cstep->combineOp);
+ return NULL; /* keep compiler quiet */
+ }
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Determine the minimum set of partitions matching the specified values
+ * using hash partitioning.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus,
+ result_index;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we have such clauses for all keys, which the planner must have
+ * found or we wouldn't have gotten here.
+ */
+ Assert(nvalues + bms_num_members(nullkeys) == partnatts);
+
+ /*
+ * If there are any values, they must have come from clauses containing
+ * an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+ result_index = partindices[rowHash % greatest_modulus];
+
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Determine the minimum set of partitions matching the specified values
+ * using list partitioning.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because list partitioning.
+ * 'value' contains the value to use for pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains list partitioning comparison function to be used to
+ * perform partition_list_bsearch
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+ Bitmapset *result;
+ int partnatts = context->partnatts;
+ int default_index = boundinfo->default_index;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partnatts == 1);
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(default_index);
+ else
+ result = NULL;
+
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null values and
+ * return.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(partindices[off] >= 0);
+ return bms_make_singleton(partindices[off]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partition satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, which is already set in 'result' if one
+ * exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, which is already set in 'result' if one
+ * exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /* Finally add the partition indexes. */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Determine the minimum set of partitions matching the specified values
+ * using range partitioning.
+ *
+ * 'nvalues', if non-zero, should be <= context->partntts - 1
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains range partitioning comparison function to be used to
+ * perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int default_index = boundinfo->default_index;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+ Bitmapset *result = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+
+ /*
+ * If there are no datums to compare keys with, or if we got a IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null values and
+ * return.
+ */
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ result = bms_add_range(result, partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be one partition. */
+ if (partindices[off + 1] >= 0)
+ return bms_make_singleton(partindices[off + 1]);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /* Matched a prefix of the partition bound at off. */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+ minoff = off;
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+ maxoff = off + 1;
+ }
+ }
+ else if (off >= 0)
+ {
+ if (partindices[off + 1] >= 0)
+ minoff = maxoff = off + 1;
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+
+ if (partindices[minoff] < 0 &&
+ minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off < 0)
+ {
+ /*
+ * All partition bounds are greater than the key, so include
+ * all partitions in the result.
+ */
+ off = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /* Matched a prefix of the partition bound at off. */
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off++;
+ break;
+ }
+ off = nextoff;
+ }
+ }
+ else
+ off++;
+ }
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0)
+ {
+ /* Matched prefix of the partition bound at off. */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off--;
+ break;
+ }
+ off = nextoff;
+ }
+
+ off++;
+ }
+ else if (!is_equal || inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * All partition bounds are greater than the key, so select
+ * none of the partitions, except the default.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ return NULL;
+ }
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys could
+ * be null.
+ */
+ if (nvalues < partnatts)
+ result = bms_add_member(result, default_index);
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ return bms_add_member(result, default_index);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums,
+ FmgrInfo **partsupfunc)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ *datums_in_query = NULL,
+ *datums_in_part = NULL,
+ i;
+ Bitmapset *excluded_parts = NULL;
+ Bitmapset *foundoffsets = NULL;
+
+ /*
+ * We can only do this exclusion for list partitions because that's the
+ * only case where we require all values to explicitly specified.
+ */
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ /*
+ * First check if the datums in the query are in any of the partitions.
+ * If found, store their offsets in foundoffsets.
+ */
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc[i], partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /*
+ * We must ensure that we got clauses for all the values that a given list
+ * partition allows before we can eliminate the partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found in the query matches the number of datums allowed in the
+ * partition.
+ */
+ if (!bms_is_empty(foundoffsets))
+ {
+ datums_in_query = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_in_query[boundinfo->indexes[i]]++;
+ }
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we
+ * found clauses for all its permitted values. We must be careful
+ * here not to eliminate the default partition. We can recognize that
+ * by it having a zero value in both arrays.
+ */
+ if (datums_in_query)
+ {
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_in_query[i] >= datums_in_part[i] &&
+ datums_in_query[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+ }
+
+ /*
+ * Because the clauses from which ne_datums were extracted are all
+ * strict, we can also exclude the NULL (-only!) partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ if (datums_in_query)
+ pfree(datums_in_query);
+ pfree(datums_in_part);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7293a60d7..cd8ac22960 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2133,6 +2133,49 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepOpNe
+ */
+static PartitionPruneStepOpNe *
+_copyPartitionPruneStepOpNe(const PartitionPruneStepOpNe *from)
+{
+ PartitionPruneStepOpNe *newnode = makeNode(PartitionPruneStepOpNe);
+
+ COPY_NODE_FIELD(exprs);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(argsteps);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,6 +5067,15 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepOpNe:
+ retval = _copyPartitionPruneStepOpNe(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..fdab8725aa 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,32 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepOpNe:
+ {
+ PartitionPruneStepOpNe *nestep =
+ (PartitionPruneStepOpNe *) node;
+
+ if (walker((Node *) nestep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep =
+ (PartitionPruneStepCombine *) node;
+
+ if (walker((Node *) cstep->argsteps, context))
+ return true;
+ }
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2958,41 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepOpNe:
+ {
+ PartitionPruneStepOpNe *nestep =
+ (PartitionPruneStepOpNe *) node;
+ PartitionPruneStepOpNe *newnode;
+
+ FLATCOPY(newnode, nestep, PartitionPruneStepOpNe);
+ MUTATE(newnode->exprs, nestep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ {
+ PartitionPruneStepCombine *cstep =
+ (PartitionPruneStepCombine *) node;
+ PartitionPruneStepCombine *newnode;
+
+ FLATCOPY(newnode, cstep, PartitionPruneStepCombine);
+ MUTATE(newnode->argsteps, cstep->argsteps, List *);
+
+ return (Node *) newnode;
+ }
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..3fd3cadb01 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -867,6 +868,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -874,6 +877,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1121,6 +1138,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..7c00d62a56
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1471 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to generate steps needed for partition pruning
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning" as all the needed
+ * information is statically available in the query being planned.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static List *get_steps_using_prefix(int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ return generate_partition_pruning_steps_internal(rel, clauses,
+ constfalse);
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values to be used as a look up key to search partitions with. Relevant
+ * details of the operator and a vector of (possibly cross-type) comparison
+ * functions is also included with each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that contains those
+ * steps.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. Caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber],
+ *ne_clauses = NIL;
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL,
+ *opsteps = NIL;
+ ListCell *lc;
+ int i;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ if (or_clause((Node *) clause))
+ {
+ PartitionPruneStepCombine *combineStep;
+ List *all_arg_steps = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ /*
+ * If arg was a clause matching this partition key, we'd
+ * get back the corresponding pruning step.
+ */
+ if (argsteps != NIL)
+ {
+ Assert(list_length(argsteps) == 1);
+ all_arg_steps = lappend(all_arg_steps,
+ linitial(argsteps));
+ }
+ else
+ {
+ /*
+ * No steps means the arg wasn't a clause matching
+ * this partition key. We cannot prune using such an
+ * arg. To indicate that to the pruning code, we must
+ * construct a PartitionPruneStepCombine and set the
+ * argsteps to an empty List. However, if we can
+ * prove using constraint exclusion that the clause
+ * refutes the table's partition constraint (if it's
+ * sub-partitioned), we need not bother with that.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStepCombine *orstep;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = makeNode(PartitionPruneStepCombine);
+ orstep->combineOp = COMBINE_OR;
+ orstep->argsteps = NIL;
+ all_arg_steps = lappend(all_arg_steps, orstep);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_OR;
+ combineStep->argsteps = all_arg_steps;
+ result = lappend(result, combineStep);
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ PartitionPruneStepCombine *combineStep;
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps = generate_partition_pruning_steps_internal(rel,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ combineStep = makeNode(PartitionPruneStepCombine);
+ combineStep->combineOp = COMBINE_AND;
+ combineStep->argsteps = argsteps;
+ result = lappend(result, combineStep);
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ Assert(pc != NULL);
+ /*
+ * If the clause was one containing an operator named <>,
+ * we generate a special pruning steps designed to handle
+ * those, so collect it in a separate list.
+ */
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ {
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+
+ generate_opsteps = true;
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * Combine expressions from all <> operator clauses into one prune step.
+ */
+ if (ne_clauses != NIL)
+ {
+ List *exprs = NIL;
+ List *cmpfns = NIL;
+ PartitionPruneStepOpNe *nestep;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+
+ exprs = lappend(exprs, pc->expr);
+ cmpfns = lappend_oid(cmpfns, pc->cmpfn);
+ }
+
+ nestep = makeNode(PartitionPruneStepOpNe);
+ nestep->exprs = exprs;
+ nestep->cmpfns = cmpfns;
+ result = lappend(result, nestep);
+ }
+
+ /* There were nothing but combining steps in the clauses we got. */
+ if (!generate_opsteps)
+ return result;
+
+ /*
+ * Now we have one list of clauses per partition key. We check here if
+ * we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NIL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys would't
+ * be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(i + 1,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (whic may not
+ * be the last partition key column). Actually, the last
+ * element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys,
+ * which get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(HTEqualStrategyNumber,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ }
+
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /*
+ * Generate one prune step for the information derived from IS NULL and IS
+ * NOT NULL clauses. Note that for IS NOT NULL clauses, simply having
+ * step suffices; there is no need to propagate the exact details of which
+ * keys are required to be NOT NULL.
+ */
+ if (opsteps == NIL &&
+ (!bms_is_empty(nullkeys) || !bms_is_empty(notnullkeys)))
+ {
+ PartitionPruneStepOp *opstep;
+
+ opstep = makeNode(PartitionPruneStepOp);
+ opstep->nullkeys = nullkeys;
+ opsteps = lappend(opsteps, opstep);
+ }
+
+ /* Add opsteps to result. */
+ result = list_concat(result, opsteps);
+
+ return result;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * one of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ if (exprType((Node *) expr) != part_scheme->partopcintype[partkeyidx])
+ {
+ int procnum = (part_scheme->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+
+ cmpfn = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprType((Node *) expr), procnum);
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps = generate_partition_pruning_steps_internal(rel,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps = generate_partition_pruning_steps_internal(rel,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStepOp *step = makeNode(PartitionPruneStepOp);
+
+ step->opstrategy = step_opstrategy;
+ step->exprs = list_make1(step_lastexpr);
+ step->cmpfns = list_make1_oid(step_lastcmpfn);
+ step->nullkeys = step_nullkeys;
+
+ return list_make1(step);
+ }
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStepOp *step;
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ step = makeNode(PartitionPruneStepOp);
+ step->opstrategy = step_opstrategy;
+ step->exprs = step_exprs1;
+ step->cmpfns = step_cmpfns1;
+ step->nullkeys = step_nullkeys;
+ result = lappend(result, step);
+ }
+ }
+
+ return result;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 093ca5208e..7c1b0de295 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 2faf0ca26e..fb29a66a64 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -73,4 +95,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 74b094a9c3..f208c11960 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -191,6 +191,10 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepOpNe,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..5a9b12b141 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,96 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/* The base Node type */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfnids' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain up to partnatts
+ * elements.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepOpNe - Information to prune using a set of mutually AND'd
+ * OpExpr clauses each containing a <> operator
+ *
+ * This is a special form of PartitionPruneStepOp, where each of the
+ * expressions in 'expr' is compared using a <> operator. To prune a given
+ * partition, we must check if each of the values it allows matches the value
+ * of one of the expressions in 'expr' using the corresponding comparison
+ * function in 'cmpfns'.
+ *
+ * Note: Since we must consider every possible value of the partition key a
+ * given partition may contain to be able to prune it using this step, we
+ * consider generating these only for list partitioned tables.
+ *----------
+ */
+typedef struct PartitionPruneStepOpNe
+{
+ PartitionPruneStep step;
+
+ List *exprs;
+ List *cmpfns;
+} PartitionPruneStepOpNe;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we determine the set of partitions for each of its
+ * argument clauses, and combine those sets as appropriate.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_OR,
+ COMBINE_AND
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *argsteps;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f151646271..ed0a885370 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e2b90f3263..d75a23e4a6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -24,11 +24,13 @@ explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-(5 rows)
+(7 rows)
explain (costs off) select * from lp where a > 'a' and a <= 'd';
QUERY PLAN
@@ -208,16 +210,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +235,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +265,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +577,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +718,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +894,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +906,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -963,9 +967,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1007,24 +1013,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1036,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1098,11 +1089,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1110,13 +1103,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
-- pruning should work fine, because prefix of keys is available
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
@@ -1124,11 +1125,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1138,7 +1141,7 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p_default t2_2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-(16 rows)
+(18 rows)
-- pruning should work fine in this case, too.
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
@@ -1150,13 +1153,15 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-> Seq Scan on mc3p1 t2
Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
Filter: (a = 1)
-(12 rows)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
--
-- pruning with clauses containing <> operator
@@ -1271,22 +1276,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning with just both columns constrained
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1340,3 +1339,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 38b5f68658..86a3a3e7ce 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -237,3 +237,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 17bf55c1f5..71e86fc254 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1584,6 +1584,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1596,6 +1597,11 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepNoop
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
--
2.11.0
v40-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v40-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 3a243e7ec4a9b20db40766d6ba4e39ede7b4fbea Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v40 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 94 ++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 106 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index cd8ac22960..5b765be011 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2304,21 +2304,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5095,9 +5080,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 765b1be74b..164eff7363 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3187,9 +3177,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f61ae03ac5..9ce40ee3b3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2230,7 +2230,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2255,6 +2254,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2304,6 +2304,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2529,16 +2530,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4074,9 +4065,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3fd3cadb01..03b94f6593 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -877,6 +877,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1330,6 +1341,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1340,7 +1357,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1367,49 +1383,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1428,9 +1450,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 50f858e420..f733075527 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -610,7 +610,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -625,6 +624,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1167,12 +1167,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1244,10 +1244,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1478,6 +1480,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1578,6 +1584,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1585,7 +1606,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6112,65 +6133,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9d94..058fb24927 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index f208c11960..fd97caad50 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,7 +264,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ed0a885370..b4219b2d57 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2127,27 +2132,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 71e86fc254..6b8851509f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1608,7 +1608,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 2018/03/23 21:19, Amit Langote wrote:
On 2018/03/21 6:29, Robert Haas wrote:
+ /* + * If there are multiple pruning steps, we perform them one after another, + * passing the result of one step as input to another. Based on the type + * of pruning step, perform_pruning_step may add or remove partitions from + * the set of partitions it receives as the input. + */The comment sounds great, but the code doesn't work that way; it
always calls bms_int_members to intersect the new result with any
previous result. I'm baffled as to how this manages to DTRT if
COMBINE_OR is used. In general I had hoped that the list of pruning
steps was something over which we were only going to iterate, not
recurse. This definitely recurses for the combine steps, but it's
still (sorta) got the idea of a list of iterable steps. That's a
weird mix.At the top-level (in get_matching_partitions), it is assumed that the
steps in the input list come from implicitly AND'd clauses, so the
intersection between partition sets that we get for each.Anyway, after David's rewrite of this portion of the patch incorporated in
the latest patch, things look a bit different here, although there is
still recursion for combine steps. I'm still considering how to make the
recursion go away.
I have managed to make the recursion go away in the attached updated
version. I guess that's the result of employing the idea of a "output
register" for individual pruning steps as mentioned in Robert's email
upthread where he detailed the "pruning steps" approach [1]/messages/by-id/CA+TgmoahUxagjeNeJTcJkD0rbk+mHTXROzWcEd+tZ8DuQG83cg@mail.gmail.com.
With the new patch, pruning steps for arguments of, say, an OR clause are
not performed recursively. Instead, each pruning step is performed
independently and its output is stored in a slot dedicated to it. Combine
steps are always executed after all of the steps corresponding to its
arguments have been executed. That's ensured by the way steps are allocated.
Jesper, off-list, reported an unused variable which has been removed in
the updated patch. Thanks Jesper! He also pointed out a case with a
list-partitioned table where pruning doesn't a produce a result as one
would expect and what constraint exclusion would produce.
create table lp (a char) partition by list (a);
create table lp_ad partition of lp for values in ('a', 'd');
create table lp_bc partition of lp for values in ('b', 'c');
create table lp_default partition of lp default;
explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
-> Seq Scan on lp_ad
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
(7 rows)
One would expect that lp_ad is not scanned.
With the implementation where a > 'a' and a < 'd' are used to prune
separately, this cannot be avoided given the way
get_partitions_for_keys_list() added by the patch works. What happens is
we prune first with a step generated for a > 'a', which returns partitions
for all datums in the table's boundinfo greater than 'a' that have a
partition assigned, which means we'll include the partition that accepts
'd'. Then when pruning with a < 'd', we select partitions for all datums
less than 'd' that have a partition assigned, which means we end up
including the partition that accepts 'a'. We intersect the result of
running these independent steps, but lp_ad is present in the result sets
of both the sets, so it ends up in the final result. Maybe there is a way
to fix that, but I haven't done anything about it yet.
Thanks,
Amit
[1]: /messages/by-id/CA+TgmoahUxagjeNeJTcJkD0rbk+mHTXROzWcEd+tZ8DuQG83cg@mail.gmail.com
/messages/by-id/CA+TgmoahUxagjeNeJTcJkD0rbk+mHTXROzWcEd+tZ8DuQG83cg@mail.gmail.com
Attachments:
v41-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v41-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From d8cc2bfb349d224cd1a9d832cb9557bd8dbd834f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v41 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index bd3a0c4a0a..093ca5208e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..f151646271 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v41-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v41-0002-Add-more-tests-for-partition-pruning.patchDownload
From 1f85a20b2551c278caa6cd632dd387dfc560d2b2 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v41 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 255 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 86 ++++++++-
2 files changed, 339 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..e2b90f3263 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,257 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..38b5f68658 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,88 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v41-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v41-0003-Faster-partition-pruning.patchDownload
From 40dd355eae67ec084286e2d57e487d1f52f8971d Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v41 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 997 +++++++++++++++
src/backend/nodes/copyfuncs.c | 52 +
src/backend/nodes/nodeFuncs.c | 46 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1663 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/primnodes.h | 96 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 318 ++++-
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 6 +
18 files changed, 3276 insertions(+), 86 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index b00a986432..23730651c9 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -197,6 +197,29 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static Bitmapset *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static Bitmapset *perform_pruning_base_step_ne(PartitionPruneContext *context,
+ PartitionPruneStepOpNe *nestep);
+static Bitmapset *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ Bitmapset **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static Bitmapset *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_partitions_for_keys_range(
+ PartitionPruneContext *context, int opstrategy,
+ Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums,
+ FmgrInfo **partsupfunc);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1619,9 +1642,983 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ Bitmapset **step_results;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (list_length(pruning_steps) == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a Bitmapset of the partition indexes that are selected
+ * after performing a given pruning step. Later steps may use the result
+ * of one or more earlier steps. Result of this function (that is, of
+ * applying all pruning steps) is the value contained in the slot of the
+ * last pruning step.
+ */
+ step_results = (Bitmapset **)
+ palloc0(list_length(pruning_steps) * sizeof(Bitmapset *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ step_results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+ case T_PartitionPruneStepOpNe:
+ step_results[step->step_id] =
+ perform_pruning_base_step_ne(context,
+ (PartitionPruneStepOpNe *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ step_results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ step_results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+
+ result = step_results[step->step_id];
+ }
+
+ return result;
+}
+
+/*
+ * perform_pruning_combine_step
+ * Returns indexes of partitions obtained by executing 'cpstep'.
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static Bitmapset *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ Bitmapset **step_results)
+{
+ ListCell *lc1;
+ Bitmapset *result = NULL;
+
+ /*
+ * In some cases, planner generates a combine step that
+ * doesn't contain any argument steps, to signal us to not
+ * prune any partitions. So, return all partitions in that
+ * case.
+ */
+ if (cstep->source_stepids == NIL)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+ else
+ {
+ bool firststep = true;
+
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+
+ /*
+ * step_results[step_id] must contain valid result, which is
+ * confirmed by the fact that cstep's ID is greater than
+ * step_id.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ switch (cstep->combineOp)
+ {
+ case COMBINE_OR:
+ result = bms_add_members(result, step_results[step_id]);
+ break;
+
+ case COMBINE_AND:
+ {
+ if (firststep)
+ {
+ result = step_results[step_id];
+ firststep = false;
+ }
+ else
+ result = bms_int_members(result,
+ step_results[step_id]);
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+ }
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Returns indexes of partitions obtained by executing 'opstep'.
+ */
+static Bitmapset *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ if (lc2 == NULL)
+ elog(ERROR, "incomplete cmpfnids list in pruning step");
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ partsupfunc[keyno] = context->partsupfunc[keyno];
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ return NULL;
+ }
+}
+
+/*
+ * perform_pruning_base_step_ne
+ * Returns indexes of partitions obtained by executing 'nestep'.
+ */
+static Bitmapset *
+perform_pruning_base_step_ne(PartitionPruneContext *context,
+ PartitionPruneStepOpNe *nestep)
+{
+ Bitmapset *excluded_parts,
+ *result;
+ Datum *ne_datums;
+ FmgrInfo **partsupfunc;
+ ListCell *lc1;
+ ListCell *lc2;
+ int n_ne_datums = list_length(nestep->exprs),
+ i;
+
+ /*
+ * Apply not-equal clauses. This only applies in the list
+ * partitioning case as this is the only partition type where
+ * we have knowledge of the entire set of values that can be
+ * stored in a given partition.
+ */
+ ne_datums = (Datum *) palloc(n_ne_datums * sizeof(Datum));
+
+ /*
+ * Some datums may require different comparison function than
+ * the default partitioning-specific one.
+ */
+ partsupfunc = (FmgrInfo **)
+ palloc(n_ne_datums * sizeof(FmgrInfo *));
+
+ i = 0;
+ forboth(lc1, nestep->exprs, lc2, nestep->cmpfns)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ Oid cmpfn = lfirst_oid(lc2);
+ Datum datum;
+
+ /*
+ * Note that we're passing 0 for partkeyidx, because there
+ * can be only one partition key with list partitioning.
+ */
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ /*
+ * If this datum is not the same type as the partition
+ * key then we'll need to use the comparison function
+ * for that type. We'll need to lookup the FmgrInfo.
+ */
+ if (cmpfn != context->partsupfunc[0].fn_oid)
+ {
+ partsupfunc[i] = (FmgrInfo *) palloc(sizeof(FmgrInfo));
+ fmgr_info(cmpfn, partsupfunc[i]);
+ }
+ else
+ partsupfunc[i] = &context->partsupfunc[0];
+ }
+
+ ne_datums[i++] = datum;
+ }
+
+ excluded_parts = get_partitions_excluded_by_ne_datums(context, ne_datums, i,
+ partsupfunc);
+ pfree(ne_datums);
+
+ /* All partitions apart from those in excluded_parts match */
+ result = bms_add_range(NULL, 0, context->nparts - 1);
+ return bms_del_members(result, excluded_parts);
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Determine the minimum set of partitions matching the specified values
+ * using hash partitioning.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus,
+ result_index;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we have such clauses for all keys, which the planner must have
+ * found or we wouldn't have gotten here.
+ */
+ Assert(nvalues + bms_num_members(nullkeys) == partnatts);
+
+ /*
+ * If there are any values, they must have come from clauses containing
+ * an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+ result_index = partindices[rowHash % greatest_modulus];
+
+ if (result_index >= 0)
+ return bms_make_singleton(result_index);
+
+ return NULL;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Determine the minimum set of partitions matching the specified values
+ * using list partitioning.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because list partitioning.
+ * 'value' contains the value to use for pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains list partitioning comparison function to be used to
+ * perform partition_list_bsearch
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+ Bitmapset *result;
+ int default_index = boundinfo->default_index;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ return bms_make_singleton(boundinfo->null_index);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL; /* shouldn't happen */
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result = bms_make_singleton(default_index);
+ else
+ result = NULL;
+
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null values and
+ * return.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ /* An exact matching datum exists. */
+ Assert(partindices[off] >= 0);
+ return bms_make_singleton(partindices[off]);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partition satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, which is already set in 'result' if one
+ * exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, which is already set in 'result' if one
+ * exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /* Finally add the partition indexes. */
+ for (i = minoff; i <= maxoff; i++)
+ result = bms_add_member(result, partindices[i]);
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Determine the minimum set of partitions matching the specified values
+ * using range partitioning.
+ *
+ * 'nvalues', if non-zero, should be <= context->partntts - 1
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains range partitioning comparison function to be used to
+ * perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int default_index = boundinfo->default_index;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+ Bitmapset *result = NULL;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+
+ /*
+ * If there are no datums to compare keys with, or if we got a IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+ if (nvalues == 0)
+ {
+ /*
+ * Add indexes of *all* partitions containing non-null values and
+ * return.
+ */
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ result = bms_add_range(result, partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be one partition. */
+ if (partindices[off + 1] >= 0)
+ return bms_make_singleton(partindices[off + 1]);
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /* Matched a prefix of the partition bound at off. */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+ minoff = off;
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+ maxoff = off + 1;
+ }
+ }
+ else if (off >= 0)
+ {
+ if (partindices[off + 1] >= 0)
+ minoff = maxoff = off + 1;
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ else
+ return NULL;
+
+ if (partindices[minoff] < 0 &&
+ minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off < 0)
+ {
+ /*
+ * All partition bounds are greater than the key, so include
+ * all partitions in the result.
+ */
+ off = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /* Matched a prefix of the partition bound at off. */
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off++;
+ break;
+ }
+ off = nextoff;
+ }
+ }
+ else
+ off++;
+ }
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0)
+ {
+ /* Matched prefix of the partition bound at off. */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off--;
+ break;
+ }
+ off = nextoff;
+ }
+
+ off++;
+ }
+ else if (!is_equal || inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * All partition bounds are greater than the key, so select
+ * none of the partitions, except the default.
+ */
+ if (partition_bound_has_default(boundinfo))
+ return bms_make_singleton(default_index);
+ return NULL;
+ }
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ minoff++;
+ }
+
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ result = bms_add_member(result, default_index);
+
+ maxoff--;
+ }
+
+ if (minoff <= maxoff)
+ result = bms_add_range(result,
+ partindices[minoff],
+ partindices[maxoff]);
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * Since partition keys with nulls are mapped to the default range
+ * partition, we must include the default partition if some keys could
+ * be null.
+ */
+ if (nvalues < partnatts)
+ result = bms_add_member(result, default_index);
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ return bms_add_member(result, default_index);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * get_partitions_excluded_by_ne_datums
+ *
+ * Returns a Bitmapset of partition indexes that can safely be removed due to
+ * the discovery of <> clauses for each datum value allowed in the partition.
+ */
+static Bitmapset *
+get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
+ Datum *ne_datums, int n_ne_datums,
+ FmgrInfo **partsupfunc)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int nparts = context->nparts,
+ *datums_in_query = NULL,
+ *datums_in_part = NULL,
+ i;
+ Bitmapset *excluded_parts = NULL;
+ Bitmapset *foundoffsets = NULL;
+
+ /*
+ * We can only do this exclusion for list partitions because that's the
+ * only case where we require all values to explicitly specified.
+ */
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ /*
+ * First check if the datums in the query are in any of the partitions.
+ * If found, store their offsets in foundoffsets.
+ */
+ for (i = 0; i < n_ne_datums; i++)
+ {
+ int offset;
+ bool is_equal;
+
+ offset = partition_list_bsearch(partsupfunc[i], partcollation,
+ boundinfo,
+ ne_datums[i], &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+
+ /*
+ * We must ensure that we got clauses for all the values that a given list
+ * partition allows before we can eliminate the partition.
+ *
+ * We'll need two arrays for this, one to count the number of unique
+ * datums found in the query which belong to each partition, and another
+ * to record the number of datums permitted in each partition. Once we've
+ * counted all this, we can eliminate any partition where the number of
+ * datums found in the query matches the number of datums allowed in the
+ * partition.
+ */
+ if (!bms_is_empty(foundoffsets))
+ {
+ datums_in_query = (int *) palloc0(sizeof(int) * nparts);
+
+ i = -1;
+ while ((i = bms_next_member(foundoffsets, i)) >= 0)
+ datums_in_query[boundinfo->indexes[i]]++;
+ }
+
+ /*
+ * Now, in a single pass over all the datums, count the number of datums
+ * permitted in each partition.
+ */
+ datums_in_part = (int *) palloc0(sizeof(int) * nparts);
+ for (i = 0; i < boundinfo->ndatums; i++)
+ datums_in_part[boundinfo->indexes[i]]++;
+
+ /*
+ * Now compare the counts and eliminate any partition for which we
+ * found clauses for all its permitted values. We must be careful
+ * here not to eliminate the default partition. We can recognize that
+ * by it having a zero value in both arrays.
+ */
+ if (datums_in_query)
+ {
+ for (i = 0; i < nparts; i++)
+ {
+ if (datums_in_query[i] >= datums_in_part[i] &&
+ datums_in_query[i] > 0)
+ excluded_parts = bms_add_member(excluded_parts, i);
+ }
+ }
+
+ /*
+ * Because the clauses from which ne_datums were extracted are all
+ * strict, we can also exclude the NULL (-only!) partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ datums_in_part[boundinfo->null_index] == 0)
+ excluded_parts = bms_add_member(excluded_parts,
+ boundinfo->null_index);
+
+ if (datums_in_query)
+ pfree(datums_in_query);
+ pfree(datums_in_part);
+
+ return excluded_parts;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7293a60d7..ddc3846771 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2133,6 +2133,49 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepOpNe
+ */
+static PartitionPruneStepOpNe *
+_copyPartitionPruneStepOpNe(const PartitionPruneStepOpNe *from)
+{
+ PartitionPruneStepOpNe *newnode = makeNode(PartitionPruneStepOpNe);
+
+ COPY_NODE_FIELD(exprs);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,6 +5067,15 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepOpNe:
+ retval = _copyPartitionPruneStepOpNe(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..94d57f1c17 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,26 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepOpNe:
+ {
+ PartitionPruneStepOpNe *nestep =
+ (PartitionPruneStepOpNe *) node;
+
+ if (walker((Node *) nestep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2952,32 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepOpNe:
+ {
+ PartitionPruneStepOpNe *nestep =
+ (PartitionPruneStepOpNe *) node;
+ PartitionPruneStepOpNe *newnode;
+
+ FLATCOPY(newnode, nestep, PartitionPruneStepOpNe);
+ MUTATE(newnode->exprs, nestep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..3fd3cadb01 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -867,6 +868,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -874,6 +877,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1121,6 +1138,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..7dfd8fdcaf
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1663 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to generate steps needed for partition pruning
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning", as all the needed
+ * information is statically available in the query being planned.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static Node *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static Node *generate_pruning_step_op_ne(GeneratePruningStepsContext *context,
+ List *exprs, List *cmpfns);
+static Node *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values will be used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the global array context->steps and
+ * each one gets an identifier which is unique across all recursive
+ * invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. Caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *ne_clauses = NIL;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * indepdently, collect their step IDs to be stored in the combine
+ * step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps means the arg wasn't a clause matching
+ * this partition key. We cannot prune using such an
+ * arg. To indicate that to the pruning code, we must
+ * construct a PartitionPruneStepCombine and set the
+ * source_stepids to an empty List.
+ *
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_OR);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ result = lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_OR));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_OR));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ Assert(pc != NULL);
+ /*
+ * If the clause was one containing an operator named <>,
+ * we generate a special pruning steps designed to handle
+ * those, so collect it in a separate list.
+ */
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ {
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * Combine expressions from all <> operator clauses into one prune step.
+ */
+ if (ne_clauses != NIL)
+ {
+ List *exprs = NIL;
+ List *cmpfns = NIL;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+
+ exprs = lappend(exprs, pc->expr);
+ cmpfns = lappend_oid(cmpfns, pc->cmpfn);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_op_ne(context, exprs, cmpfns));
+ }
+
+ /*
+ * generate_opsteps set to false means no OpExprs were directly presemt in
+ * the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ nullkeys));
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ NULL));
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an AND combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_AND));
+ }
+
+ return result;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys would't
+ * be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (whic may not
+ * be the last partition key column). Actually, the last
+ * element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys,
+ * which get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ }
+
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ return (PartitionPruneStep *)
+ generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_AND);
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * one of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ if (exprType((Node *) expr) != part_scheme->partopcintype[partkeyidx])
+ {
+ int procnum = (part_scheme->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+
+ cmpfn = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprType((Node *) expr), procnum);
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ return list_make1(generate_pruning_step_op(context, step_opstrategy,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys));
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ result = lappend(result,
+ generate_pruning_step_op(context,
+ step_opstrategy,
+ step_exprs1,
+ step_cmpfns1,
+ step_nullkeys));
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Following functions generate pruning steps of various types. Each step
+ * that's created is added to a global context->steps and receive a globally
+ * unique identifier that's sourced from context->next_step_id.
+ */
+
+static Node *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+ opstep->opstrategy = opstrategy;
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (Node *) opstep;
+}
+
+static Node *
+generate_pruning_step_op_ne(GeneratePruningStepsContext *context,
+ List *exprs, List *cmpfns)
+{
+ PartitionPruneStepOpNe *nestep = makeNode(PartitionPruneStepOpNe);
+
+ nestep->step.step_id = context->next_step_id++;
+ nestep->exprs = exprs;
+ nestep->cmpfns = cmpfns;
+
+ context->steps = lappend(context->steps, nestep);
+
+ return (Node *) nestep;
+}
+
+static Node *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (Node *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 093ca5208e..7c1b0de295 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..8981901272 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -74,4 +96,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 443de22704..df9a6ea669 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -192,6 +192,10 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepOpNe,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..959964ba13 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,100 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfnids' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain up to partnatts
+ * elements.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepOpNe - Information to prune using a set of mutually AND'd
+ * OpExpr clauses each containing a <> operator
+ *
+ * This is a special form of PartitionPruneStepOp, where each of the
+ * expressions in 'expr' is compared using a <> operator. To prune a given
+ * partition, we must check if each of the values it allows matches the value
+ * of one of the expressions in 'expr' using the corresponding comparison
+ * function in 'cmpfns'.
+ *
+ * Note: Since we must consider every possible value of the partition key a
+ * given partition may contain to be able to prune it using this step, we
+ * consider generating these only for list partitioned tables.
+ *----------
+ */
+typedef struct PartitionPruneStepOpNe
+{
+ PartitionPruneStep step;
+
+ List *exprs;
+ List *cmpfns;
+} PartitionPruneStepOpNe;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_OR,
+ COMBINE_AND
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f151646271..ed0a885370 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e2b90f3263..d75a23e4a6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -24,11 +24,13 @@ explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
+ -> Seq Scan on lp_ad
+ Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-(5 rows)
+(7 rows)
explain (costs off) select * from lp where a > 'a' and a <= 'd';
QUERY PLAN
@@ -208,16 +210,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +235,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +265,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +577,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +718,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +894,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +906,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -963,9 +967,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1007,24 +1013,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1036,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1098,11 +1089,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1110,13 +1103,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
-- pruning should work fine, because prefix of keys is available
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
@@ -1124,11 +1125,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1138,7 +1141,7 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p_default t2_2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-(16 rows)
+(18 rows)
-- pruning should work fine in this case, too.
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
@@ -1150,13 +1153,15 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-> Seq Scan on mc3p1 t2
Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
Filter: (a = 1)
-(12 rows)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
--
-- pruning with clauses containing <> operator
@@ -1271,22 +1276,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning with just both columns constrained
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1340,3 +1339,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 38b5f68658..86a3a3e7ce 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -237,3 +237,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 17bf55c1f5..71e86fc254 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1584,6 +1584,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1596,6 +1597,11 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepNoop
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
--
2.11.0
v41-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v41-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 1b9de6dc2405da69ffbc86f08bc62a4709032a8e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v41 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 94 ++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 106 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index ddc3846771..6dd525323c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2304,21 +2304,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5095,9 +5080,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 765b1be74b..164eff7363 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3187,9 +3177,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f61ae03ac5..9ce40ee3b3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2230,7 +2230,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2255,6 +2254,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2304,6 +2304,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2529,16 +2530,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4074,9 +4065,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3fd3cadb01..03b94f6593 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -877,6 +877,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1330,6 +1341,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1340,7 +1357,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1367,49 +1383,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1428,9 +1450,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 52c21e6870..00d6252552 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -612,7 +612,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -627,6 +626,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1169,12 +1169,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1246,10 +1246,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1480,6 +1482,10 @@ inheritance_planner(PlannerInfo *root)
if (IS_DUMMY_PATH(subpath))
continue;
+ /* Add the current parent's RT index to the partitioned rels set. */
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
/*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
@@ -1580,6 +1586,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1587,7 +1608,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6114,65 +6135,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9d94..058fb24927 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index df9a6ea669..52dec2e5ef 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -265,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ed0a885370..b4219b2d57 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2127,27 +2132,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 71e86fc254..6b8851509f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1608,7 +1608,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 27 March 2018 at 00:42, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Also, I started thinking that implementing pruning using <> operators with
a PartitionPruneCombineOp was not such a great idea. That needed us to
add argexprs and argcmpfns to that struct, which seemed a bit odd. I
defined a new pruning node type called PartitionPruneStepOpNe, which still
seems a bit odd, but given that our support for pruning using <> is quite
specialized, that may be fine.
Seems better
I added a bunch of hopefully informative comments in partprune.c and for
the struct definitions of pruning step nodes.
Yes. That looks better.
Please find attached find a new version.
Thanks. I've made a pass over this and I only have the attached set of
fixes and the following to show for it.
1. Please add more comments in the switch statement in
get_partitions_for_keys_range
2. More an observation than anything else. I see we've lost the
ability to prune range queries on LIST partitions in some cases.
For example:
CREATE TABLE listp (a INT) PARTITION BY LIST(a);
CREATE TABLE listp1_3 PARTITION OF listp FOR VALUES IN(1,3);
EXPLAIN SELECT * FROM listp WHERE a > 1 AND a < 3;
This is just down to the new pruning step design. WHERE we first prune
on "a > 1", which matches listp1_3 due to 3, then binary-AND to the
results of the "a < 3", which matches listp1_3 due to 1. This is a
shame, but probably not the end of the world. Fixing it would likely
mean moving back towards the previous design.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v40_drowley_fixes.patchapplication/octet-stream; name=v40_drowley_fixes.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 0e47d9a0de1..5d44644bdf5 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1668,6 +1668,8 @@ perform_pruning_base_step(PartitionPruneContext *context,
Datum values[PARTITION_MAX_KEYS];
FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
nvalues = 0;
lc1 = list_head(opstep->exprs);
lc2 = list_head(opstep->cmpfns);
@@ -1708,8 +1710,6 @@ perform_pruning_base_step(PartitionPruneContext *context,
* than the one cached in the PartitionKey, we'll need to
* look up the FmgrInfo.
*/
- if (lc2 == NULL)
- elog(ERROR, "incomplete cmpfnids list in pruning step");
cmpfn = lfirst_oid(lc2);
Assert(OidIsValid(cmpfn));
if (cmpfn != context->partsupfunc[keyno].fn_oid)
@@ -1757,6 +1757,8 @@ perform_pruning_base_step(PartitionPruneContext *context,
/*
* perform_pruning_base_step_ne
* Returns indexes of partitions obtained by executing 'nestep'.
+ *
+ * Note this pruning method is only supported by LIST partitioning.
*/
static Bitmapset *
perform_pruning_base_step_ne(PartitionPruneContext *context,
@@ -1771,6 +1773,9 @@ perform_pruning_base_step_ne(PartitionPruneContext *context,
int n_ne_datums = list_length(nestep->exprs),
i;
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(list_length(nestep->exprs) == list_length(nestep->cmpfns));
+
/*
* Apply not-equal clauses. This only applies in the list
* partitioning case as this is the only partition type where
@@ -1783,8 +1788,7 @@ perform_pruning_base_step_ne(PartitionPruneContext *context,
* Some datums may require different comparison function than
* the default partitioning-specific one.
*/
- partsupfunc = (FmgrInfo **)
- palloc(n_ne_datums * sizeof(FmgrInfo *));
+ partsupfunc = (FmgrInfo **) palloc(n_ne_datums * sizeof(FmgrInfo *));
i = 0;
forboth(lc1, nestep->exprs, lc2, nestep->cmpfns)
@@ -1811,22 +1815,21 @@ perform_pruning_base_step_ne(PartitionPruneContext *context,
}
else
partsupfunc[i] = &context->partsupfunc[0];
- }
- ne_datums[i++] = datum;
+ ne_datums[i++] = datum;
+ }
}
excluded_parts = get_partitions_excluded_by_ne_datums(context, ne_datums, i,
partsupfunc);
pfree(ne_datums);
+ pfree(partsupfunc);
/* All partitions apart from those in excluded_parts match */
result = bms_add_range(NULL, 0, context->nparts - 1);
return bms_del_members(result, excluded_parts);
}
-
-
/*
* perform_pruning_combine_step
* Returns indexes of partitions obtained by executing 'cstep'.
@@ -1839,12 +1842,12 @@ perform_pruning_combine_step(PartitionPruneContext *context,
ListCell *lc;
/*
- * In some cases, planner generates a combine step that doesn't contain
+ * In some cases the planner generates a combine step that doesn't contain
* any argument steps, to signal us to not prune any partitions. So,
* return all partitions in that case.
*/
if (cstep->argsteps == NIL)
- return bms_add_range(NULL, 0, context->nparts - 1);
+ return bms_add_range(NULL, 0, context->nparts - 1);
switch (cstep->combineOp)
{
@@ -1863,7 +1866,10 @@ perform_pruning_combine_step(PartitionPruneContext *context,
*/
argresult = perform_pruning_step(context, step);
- /* Add argresult to result. */
+ /*
+ * Accumulate the matching partitions, effectively
+ * bitwise-OR.
+ */
result = bms_add_members(result, argresult);
}
@@ -1876,6 +1882,11 @@ perform_pruning_combine_step(PartitionPruneContext *context,
firststep = true;
result = NULL;
+
+ /*
+ * Determine the partitions which are common to each step
+ * using bitwise-AND.
+ */
foreach(lc, cstep->argsteps)
{
PartitionPruneStep *step = lfirst(lc);
@@ -1883,6 +1894,11 @@ perform_pruning_combine_step(PartitionPruneContext *context,
argresult = perform_pruning_step(context, step);
+ /*
+ * For the first step we take the entire result so that
+ * we have something to bitwise-AND to on subsequent
+ * steps.
+ */
if (firststep)
{
result = argresult;
@@ -2236,7 +2252,7 @@ get_partitions_for_keys_range(PartitionPruneContext *context,
{
if (nvalues == partnatts)
{
- /* There can only be one partition. */
+ /* There can only be zero or one matching partitions. */
if (partindices[off + 1] >= 0)
return bms_make_singleton(partindices[off + 1]);
else if (partition_bound_has_default(boundinfo))
@@ -2491,7 +2507,8 @@ get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
/*
* We can only do this exclusion for list partitions because that's the
- * only case where we require all values to explicitly specified.
+ * only case where we require all values to explicitly specified in the
+ * partition boundinfo.
*/
Assert(context->strategy == PARTITION_STRATEGY_LIST);
Assert(context->partnatts == 1);
@@ -2525,6 +2542,12 @@ get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
* counted all this, we can eliminate any partition where the number of
* datums found in the query matches the number of datums allowed in the
* partition.
+ *
+ * It might seem like we can skip attempting pruning any partitions here
+ * if we found no not-equal clauses which match any partitions above, but
+ * because we previously ensured these clauses are strict we may be able
+ * to at least eliminate the NULL partition, providing that partition does
+ * not also allow any non-NULL values.
*/
if (!bms_is_empty(foundoffsets))
{
@@ -2535,6 +2558,13 @@ get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
datums_in_query[boundinfo->indexes[i]]++;
}
+ /*
+ * We can't prune anything if there's no foundoffsets and no NULL
+ * partition.
+ */
+ else if (!partition_bound_accepts_nulls(boundinfo))
+ return NULL;
+
/*
* Now, in a single pass over all the datums, count the number of datums
* permitted in each partition.
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 5b765be0114..2053a114f4e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2158,6 +2158,7 @@ _copyPartitionPruneStepOpNe(const PartitionPruneStepOpNe *from)
PartitionPruneStepOpNe *newnode = makeNode(PartitionPruneStepOpNe);
COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
return newnode;
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f733075527b..24592e2205e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1481,8 +1481,9 @@ inheritance_planner(PlannerInfo *root)
continue;
/* Add the current parent's RT index to the partitioned rels set. */
- partitioned_relids = bms_add_member(partitioned_relids,
- appinfo->parent_relid);
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
/*
* If this is the first non-excluded child, its post-planning rtable
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index 7c00d62a56a..92ca13f6936 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -197,7 +197,7 @@ generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
* From OpExpr clauses that are mutually AND'd, we find combinations of those
* that match to the partition key columns and for every such combination,
* we emit a PartitionPruneStepOp containing a vector of expressions whose
- * values to be used as a look up key to search partitions with. Relevant
+ * values are used as a look up key to search for partitions with. Relevant
* details of the operator and a vector of (possibly cross-type) comparison
* functions is also included with each step.
*
@@ -622,8 +622,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
}
/*
- * If we've decided that clauses for subsequent partition keys would't
- * be useful for pruning, don't look.
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't look.
*/
if (!consider_next_key)
break;
@@ -759,9 +759,9 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
ListCell *lc1;
/*
- * Locate the clause for the greatest column (whic may not
- * be the last partition key column). Actually, the last
- * element of eq_clauses must give us what we need.
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
*/
pc = llast(eq_clauses);
@@ -857,7 +857,7 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel, List *clauses,
*
* Return value:
*
- * one of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
* matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
* this means the clause is self-contradictory (which can happen only if it's
* a BoolExpr whose arguments may be self-contradictory)
@@ -927,6 +927,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
Oid commutator = InvalidOid,
negator = InvalidOid;
Oid cmpfn;
+ Oid exprtype;
leftop = (Expr *) get_leftop(clause);
if (IsA(leftop, RelabelType))
@@ -1014,7 +1015,8 @@ match_clause_to_partition_key(RelOptInfo *rel,
}
/* Check if we're going to need a cross-type comparison function. */
- if (exprType((Node *) expr) != part_scheme->partopcintype[partkeyidx])
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
{
int procnum = (part_scheme->strategy == PARTITION_STRATEGY_HASH)
? HASHEXTENDED_PROC
@@ -1022,7 +1024,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
cmpfn = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
part_scheme->partopcintype[partkeyidx],
- exprType((Node *) expr), procnum);
+ exprtype, procnum);
/* If we couldn't find one, we cannot use this expression. */
if (!OidIsValid(cmpfn))
return PARTCLAUSE_UNSUPPORTED;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 5a9b12b1414..400e79bd4f8 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1524,10 +1524,10 @@ typedef struct PartitionPruneStep
* where partnatts is the number of partition key columns. 'opstrategy' is the
* strategy of the operator in the clause matched to the last partition key.
* 'exprs' contains expressions which comprise the look-up key to be passed to
- * the partition bound search function. 'cmpfnids' contains the OIDs of
+ * the partition bound search function. 'cmpfns' contains the OIDs of
* comparison function used to compare aforementioned expressions with
- * partition bounds. Both 'exprs' and 'cmpfns' contain up to partnatts
- * elements.
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
*
* Once we find the offset of a partition bound using the look-up key, we
* determine which partitions to include in the result based on the value of
@@ -1559,9 +1559,9 @@ typedef struct PartitionPruneStepOp
* OpExpr clauses each containing a <> operator
*
* This is a special form of PartitionPruneStepOp, where each of the
- * expressions in 'expr' is compared using a <> operator. To prune a given
+ * expressions in 'exprs' is compared using a <> operator. To prune a given
* partition, we must check if each of the values it allows matches the value
- * of one of the expressions in 'expr' using the corresponding comparison
+ * of one of the expressions in 'exprs' using the corresponding comparison
* function in 'cmpfns'.
*
* Note: Since we must consider every possible value of the partition key a
Hi Amit,
On 03/27/2018 06:42 AM, Amit Langote wrote:
I have managed to make the recursion go away in the attached updated
version. I guess that's the result of employing the idea of a "output
register" for individual pruning steps as mentioned in Robert's email
upthread where he detailed the "pruning steps" approach [1].With the new patch, pruning steps for arguments of, say, an OR clause are
not performed recursively. Instead, each pruning step is performed
independently and its output is stored in a slot dedicated to it. Combine
steps are always executed after all of the steps corresponding to its
arguments have been executed. That's ensured by the way steps are allocated.
Running v41 with "partition_prune" under valgrind gives the attached report.
Best regards,
Jesper
Attachments:
Amit Langote wrote:
[Jesper] also pointed out a case with a
list-partitioned table where pruning doesn't a produce a result as one
would expect and what constraint exclusion would produce.create table lp (a char) partition by list (a);
create table lp_ad partition of lp for values in ('a', 'd');
create table lp_bc partition of lp for values in ('b', 'c');
create table lp_default partition of lp default;
explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
-> Seq Scan on lp_ad
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
(7 rows)One would expect that lp_ad is not scanned.
One would? I, for one, wouldn't particularly sweat over this case TBH.
It seems a pretty silly case. If this works for "a <> 'a' and a <> 'd'"
(I mean, lp_ad is pruned for that qual), that sounds sufficient to me.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Amit Langote wrote:
[Jesper] also pointed out a case with a
list-partitioned table where pruning doesn't a produce a result as one
would expect and what constraint exclusion would produce.create table lp (a char) partition by list (a);
create table lp_ad partition of lp for values in ('a', 'd');
create table lp_bc partition of lp for values in ('b', 'c');
create table lp_default partition of lp default;
explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
-> Seq Scan on lp_ad
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
(7 rows)One would expect that lp_ad is not scanned.
One would? I, for one, wouldn't particularly sweat over this case TBH.
That example works in HEAD, so if somebody is proposing a patch that
breaks it, seems like that needs investigation.
regards, tom lane
Hi,
On 03/27/2018 01:46 PM, Jesper Pedersen wrote:
Running v41 with "partition_prune" under valgrind gives the attached
report.
The reports mostly involve interaction with catcache.c and dynahash.c,
so something for a separate thread.
Best regards,
Jesper
On 27 March 2018 at 23:42, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
I have managed to make the recursion go away in the attached updated
version. I guess that's the result of employing the idea of a "output
register" for individual pruning steps as mentioned in Robert's email
upthread where he detailed the "pruning steps" approach [1].
Thanks for making that work. I've only glanced at the patch, and not
taken enough time to understand how the new parts work yet.
In the meantime, I've attached some fixes for v41 which I previously
submitted for v40.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v41_drowley_fixes.patchapplication/octet-stream; name=v41_drowley_fixes.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 1b68ab1b82..cf396856a9 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1790,6 +1790,8 @@ perform_pruning_base_step(PartitionPruneContext *context,
Datum values[PARTITION_MAX_KEYS];
FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
nvalues = 0;
lc1 = list_head(opstep->exprs);
lc2 = list_head(opstep->cmpfns);
@@ -1830,8 +1832,6 @@ perform_pruning_base_step(PartitionPruneContext *context,
* than the one cached in the PartitionKey, we'll need to
* look up the FmgrInfo.
*/
- if (lc2 == NULL)
- elog(ERROR, "incomplete cmpfnids list in pruning step");
cmpfn = lfirst_oid(lc2);
Assert(OidIsValid(cmpfn));
if (cmpfn != context->partsupfunc[keyno].fn_oid)
@@ -1879,6 +1879,8 @@ perform_pruning_base_step(PartitionPruneContext *context,
/*
* perform_pruning_base_step_ne
* Returns indexes of partitions obtained by executing 'nestep'.
+ *
+ * Note this pruning method is only supported by LIST partitioning.
*/
static Bitmapset *
perform_pruning_base_step_ne(PartitionPruneContext *context,
@@ -1893,6 +1895,9 @@ perform_pruning_base_step_ne(PartitionPruneContext *context,
int n_ne_datums = list_length(nestep->exprs),
i;
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(list_length(nestep->exprs) == list_length(nestep->cmpfns));
+
/*
* Apply not-equal clauses. This only applies in the list
* partitioning case as this is the only partition type where
@@ -1905,8 +1910,7 @@ perform_pruning_base_step_ne(PartitionPruneContext *context,
* Some datums may require different comparison function than
* the default partitioning-specific one.
*/
- partsupfunc = (FmgrInfo **)
- palloc(n_ne_datums * sizeof(FmgrInfo *));
+ partsupfunc = (FmgrInfo **) palloc(n_ne_datums * sizeof(FmgrInfo *));
i = 0;
forboth(lc1, nestep->exprs, lc2, nestep->cmpfns)
@@ -1933,14 +1937,15 @@ perform_pruning_base_step_ne(PartitionPruneContext *context,
}
else
partsupfunc[i] = &context->partsupfunc[0];
- }
- ne_datums[i++] = datum;
+ ne_datums[i++] = datum;
+ }
}
excluded_parts = get_partitions_excluded_by_ne_datums(context, ne_datums, i,
partsupfunc);
pfree(ne_datums);
+ pfree(partsupfunc);
/* All partitions apart from those in excluded_parts match */
result = bms_add_range(NULL, 0, context->nparts - 1);
@@ -2280,7 +2285,7 @@ get_partitions_for_keys_range(PartitionPruneContext *context,
{
if (nvalues == partnatts)
{
- /* There can only be one partition. */
+ /* There can only be zero or one matching partitions. */
if (partindices[off + 1] >= 0)
return bms_make_singleton(partindices[off + 1]);
else if (partition_bound_has_default(boundinfo))
@@ -2535,7 +2540,8 @@ get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
/*
* We can only do this exclusion for list partitions because that's the
- * only case where we require all values to explicitly specified.
+ * only case where we require all values to explicitly specified in the
+ * partition boundinfo.
*/
Assert(context->strategy == PARTITION_STRATEGY_LIST);
Assert(context->partnatts == 1);
@@ -2569,6 +2575,12 @@ get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
* counted all this, we can eliminate any partition where the number of
* datums found in the query matches the number of datums allowed in the
* partition.
+ *
+ * It might seem like we can skip attempting pruning any partitions here
+ * if we found no not-equal clauses which match any partitions above, but
+ * because we previously ensured these clauses are strict we may be able
+ * to at least eliminate the NULL partition, providing that partition does
+ * not also allow any non-NULL values.
*/
if (!bms_is_empty(foundoffsets))
{
@@ -2579,6 +2591,13 @@ get_partitions_excluded_by_ne_datums(PartitionPruneContext *context,
datums_in_query[boundinfo->indexes[i]]++;
}
+ /*
+ * We can't prune anything if there's no foundoffsets and no NULL
+ * partition.
+ */
+ else if (!partition_bound_accepts_nulls(boundinfo))
+ return NULL;
+
/*
* Now, in a single pass over all the datums, count the number of datums
* permitted in each partition.
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6dd525323c..26c5281385 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2158,6 +2158,7 @@ _copyPartitionPruneStepOpNe(const PartitionPruneStepOpNe *from)
PartitionPruneStepOpNe *newnode = makeNode(PartitionPruneStepOpNe);
COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
return newnode;
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 00d6252552..2cb7f086b7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1483,8 +1483,9 @@ inheritance_planner(PlannerInfo *root)
continue;
/* Add the current parent's RT index to the partitioned rels set. */
- partitioned_relids = bms_add_member(partitioned_relids,
- appinfo->parent_relid);
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
/*
* If this is the first non-excluded child, its post-planning rtable
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index 7dfd8fdcaf..e2d8fdb2a2 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -758,8 +758,8 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
}
/*
- * If we've decided that clauses for subsequent partition keys would't
- * be useful for pruning, don't look.
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't look.
*/
if (!consider_next_key)
break;
@@ -895,9 +895,9 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
ListCell *lc1;
/*
- * Locate the clause for the greatest column (whic may not
- * be the last partition key column). Actually, the last
- * element of eq_clauses must give us what we need.
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
*/
pc = llast(eq_clauses);
@@ -994,7 +994,7 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
*
* Return value:
*
- * one of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
* matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
* this means the clause is self-contradictory (which can happen only if it's
* a BoolExpr whose arguments may be self-contradictory)
@@ -1065,6 +1065,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
Oid commutator = InvalidOid,
negator = InvalidOid;
Oid cmpfn;
+ Oid exprtype;
leftop = (Expr *) get_leftop(clause);
if (IsA(leftop, RelabelType))
@@ -1152,7 +1153,8 @@ match_clause_to_partition_key(RelOptInfo *rel,
}
/* Check if we're going to need a cross-type comparison function. */
- if (exprType((Node *) expr) != part_scheme->partopcintype[partkeyidx])
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
{
int procnum = (part_scheme->strategy == PARTITION_STRATEGY_HASH)
? HASHEXTENDED_PROC
@@ -1160,7 +1162,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
cmpfn = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
part_scheme->partopcintype[partkeyidx],
- exprType((Node *) expr), procnum);
+ exprtype, procnum);
/* If we couldn't find one, we cannot use this expression. */
if (!OidIsValid(cmpfn))
return PARTCLAUSE_UNSUPPORTED;
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 959964ba13..a0345a0abf 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1528,10 +1528,10 @@ typedef struct PartitionPruneStep
* where partnatts is the number of partition key columns. 'opstrategy' is the
* strategy of the operator in the clause matched to the last partition key.
* 'exprs' contains expressions which comprise the look-up key to be passed to
- * the partition bound search function. 'cmpfnids' contains the OIDs of
+ * the partition bound search function. 'cmpfns' contains the OIDs of
* comparison function used to compare aforementioned expressions with
- * partition bounds. Both 'exprs' and 'cmpfns' contain up to partnatts
- * elements.
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
*
* Once we find the offset of a partition bound using the look-up key, we
* determine which partitions to include in the result based on the value of
@@ -1563,9 +1563,9 @@ typedef struct PartitionPruneStepOp
* OpExpr clauses each containing a <> operator
*
* This is a special form of PartitionPruneStepOp, where each of the
- * expressions in 'expr' is compared using a <> operator. To prune a given
+ * expressions in 'exprs' is compared using a <> operator. To prune a given
* partition, we must check if each of the values it allows matches the value
- * of one of the expressions in 'expr' using the corresponding comparison
+ * of one of the expressions in 'exprs' using the corresponding comparison
* function in 'cmpfns'.
*
* Note: Since we must consider every possible value of the partition key a
On 2018/03/28 12:58, David Rowley wrote:
On 27 March 2018 at 23:42, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
I have managed to make the recursion go away in the attached updated
version. I guess that's the result of employing the idea of a "output
register" for individual pruning steps as mentioned in Robert's email
upthread where he detailed the "pruning steps" approach [1].Thanks for making that work. I've only glanced at the patch, and not
taken enough time to understand how the new parts work yet.In the meantime, I've attached some fixes for v41 which I previously
submitted for v40.
Thank you. I've merged it.
Also, I have redesigned how we derive partition indexes after running
pruning steps. Previously, for each step we'd determine the indexes of
"partitions" that are not pruned leading to a list partition not being
pruned sometimes, as shown in the two recent examples. Instead, in the
new approach, we only keep track of the indexes of the "datums" that
satisfy individual pruning steps (both base pruning steps and combine
steps) and only figure out the partition indexes after we've determined
set of datums that survive all pruning steps. That is, after we're done
executing all pruning steps. Whether we need to scan special partitions
like null-only and default partition is tracked along with datum indexes
for each step. With this change, pruning works as expected in both examples:
create table lp (a char) partition by list (a);
create table lp_ad partition of lp for values in ('a', 'd');
create table lp_bc partition of lp for values in ('b', 'c');
create table lp_default partition of lp default;
explain (costs off) select * from lp where a > 'a' and a < 'd';
QUERY PLAN
-----------------------------------------------------------
Append
-> Seq Scan on lp_bc
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a > 'a'::bpchar) AND (a < 'd'::bpchar))
(5 rows)
CREATE TABLE listp (a INT) PARTITION BY LIST(a);
CREATE TABLE listp1_3 PARTITION OF listp FOR VALUES IN (1, 3);
EXPLAIN SELECT * FROM listp WHERE a > 1 AND a < 3;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=4)
One-Time Filter: false
(2 rows)
Moreover, with pruning now working at a high-level with datum indexes
instead of partition indexes, pruning for PartitionPruneStepOpNe is
simplified greatly. We simply delete from a bitmapset initially
containing the indexes of all datums in boundinfo the indexes of those
that appear in the query. So:
explain (costs off) select * from lp where a <> 'a' and a <> 'd';
QUERY PLAN
-------------------------------------------------------------
Append
-> Seq Scan on lp_bc
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
-> Seq Scan on lp_default
Filter: ((a <> 'a'::bpchar) AND (a <> 'd'::bpchar))
(5 rows)
where we delete indexes of 'a' and 'd' from the bitmapset initially
containing indexes of all datums, leaving us with only those of 'b' and
'c'. Also, the default partition is scanned as it would always be for a
PartitionPruneStepOpNe step.
Attached is the updated set of patches, which contains other miscellaneous
changes such as updated comments, beside the main changes described above.
Regards,
Amit
Attachments:
v42-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v42-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 17dd6b87944ef9b8f1788760f45c998879f4aa87 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v42 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0231f8bf7c..30459f7ba9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..f151646271 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v42-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v42-0002-Add-more-tests-for-partition-pruning.patchDownload
From eb46345476ef697b283729f49ea1ff8dc35baee7 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v42 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 255 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 86 ++++++++-
2 files changed, 339 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..e2b90f3263 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,257 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..38b5f68658 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,88 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v42-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v42-0003-Faster-partition-pruning.patchDownload
From 21fe434a209d30ef037dc5345ef815376ac9a885 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v42 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1099 ++++++++++++++++
src/backend/nodes/copyfuncs.c | 53 +
src/backend/nodes/nodeFuncs.c | 46 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1665 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/primnodes.h | 96 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 314 ++++-
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 8 +
18 files changed, 3380 insertions(+), 85 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..34ab985b86 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,18 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * The following struct describes the result of performing one
+ * PartitionPruneStep.
+ */
+typedef struct PruneStepResult
+{
+ Bitmapset *datum_offsets;
+
+ /* Set if we need to scan the default and/or the null partition, resp. */
+ bool scan_default;
+ bool scan_null;
+} PruneStepResult;
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -197,6 +209,26 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_base_step_ne(PartitionPruneContext *context,
+ PartitionPruneStepOpNe *nestep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static PruneStepResult *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_partitions_for_keys_range(
+ PartitionPruneContext *context, int opstrategy,
+ Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1620,9 +1652,1076 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **step_results,
+ *last_step_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. Result of
+ * of applying all pruning steps is the value contained in the slot of the
+ * last pruning step.
+ */
+ step_results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ step_results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+ case T_PartitionPruneStepOpNe:
+ step_results[step->step_id] =
+ perform_pruning_base_step_ne(context,
+ (PartitionPruneStepOpNe *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ step_results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ step_results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know that offsets of all the datums whose
+ * corresponding partitions need to be in the result, including special
+ * null-accepting and default partitions. Collect the actual partition
+ * indexes now.
+ */
+ i = -1;
+ result = NULL;
+ last_step_result = step_results[num_steps - 1];
+ while ((i = bms_next_member(last_step_result->datum_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed. */
+ if (last_step_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (last_step_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ result = bms_add_member(result, context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /* There better be same number of expressions and compare functions. */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ partsupfunc[keyno] = context->partsupfunc[keyno];
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ return NULL;
+ }
+}
+
+/*
+ * perform_pruning_base_step_ne
+ * Returns indexes of datums in context->boundinfo that are not contained
+ * in nestep->exprs
+ *
+ * If there is a special null-accepting partition that accepts no other datum,
+ * then it scan_null of the returned PruneStepResult will be set. Also,
+ * scan_default is always set in this case.
+ *
+ * Note that this pruning method only supports LIST partitioning.
+ */
+static PruneStepResult *
+perform_pruning_base_step_ne(PartitionPruneContext *context,
+ PartitionPruneStepOpNe *nestep)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ PruneStepResult *result;
+ ListCell *lc1;
+ ListCell *lc2;
+ Bitmapset *foundoffsets = NULL;
+
+ /*
+ * We can only do this exclusion for list partitions because that's the
+ * only case where we require all values to explicitly specified in the
+ * partition boundinfo.
+ */
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ /* There better be same number of expressions and compare functions. */
+ Assert(list_length(nestep->exprs) == list_length(nestep->cmpfns));
+ Assert(context->partnatts == 1);
+
+ /*
+ * Check if the datums in the query are in any of the partitions. If
+ * found, store their offsets in foundoffsets.
+ */
+ forboth(lc1, nestep->exprs, lc2, nestep->cmpfns)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ Oid cmpfn = lfirst_oid(lc2);
+ Datum datum;
+
+ /*
+ * Note that we're passing 0 for partkeyidx, because there
+ * can be only one partition key with list partitioning.
+ */
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ FmgrInfo partsupfunc;
+ int offset;
+ bool is_equal;
+ /*
+ * If this datum is not the same type as the partition
+ * key then we'll need to use the comparison function
+ * for that type. We'll need to lookup the FmgrInfo.
+ */
+ if (cmpfn != context->partsupfunc[0].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc);
+ else
+ partsupfunc = context->partsupfunc[0];
+
+ offset = partition_list_bsearch(&partsupfunc, partcollation,
+ boundinfo,
+ datum, &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+ }
+
+ /* All partitions apart from those in excluded_parts match */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ result->datum_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ result->datum_offsets = bms_del_members(result->datum_offsets,
+ foundoffsets);
+ /*
+ * Because the clauses from which these datums were extracted are all
+ * strict, we can also exclude the NULL (-only!) partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ !bms_is_member(boundinfo->null_index, result->datum_offsets))
+ result->scan_null = true;
+ /* Always scan the default partition. */
+ result->scan_default = true;
+
+ return result;
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+
+ /*
+ * In some cases, planner generates a combine step that doesn't contain
+ * any argument steps, to signal us to not prune any partitions. So,
+ * return indexes of all datums in that case, including null and/or
+ * default partition, if any.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (cstep->source_stepids == NIL)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->datum_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+ else
+ {
+ bool firststep = true;
+
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain valid result, which is
+ * confirmed by the fact that cstep's ID is greater than
+ * step_id.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ switch (cstep->combineOp)
+ {
+ case COMBINE_OR:
+ /*
+ * Add indexes of datums given by the argument step's
+ * result.
+ */
+ result->datum_offsets =
+ bms_add_members(result->datum_offsets,
+ step_result->datum_offsets);
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ break;
+
+ case COMBINE_AND:
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->datum_offsets = step_result->datum_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /*
+ * Only keep indexes of datums that are in argument
+ * step's result.
+ */
+ result->datum_offsets =
+ bms_int_members(result->datum_offsets,
+ step_result->datum_offsets);
+ /*
+ * Update whether to scan null and default partitions.
+ */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default =
+ step_result->scan_default;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Determine the offset of datum in context->boundinfo that matches
+ * the key per hash partitioning
+ *
+ * Since there are neither of the special partitions (null and default) in
+ * case of hash partitioning, scan_null and scan_default are not set.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we have such clauses for all keys, which the planner must have
+ * found or we wouldn't have gotten here.
+ */
+ Assert(nvalues + bms_num_members(nullkeys) == partnatts);
+
+ /*
+ * If there are any values, they must have come from clauses containing
+ * an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ result->datum_offsets = bms_make_singleton(rowHash % greatest_modulus);
+ result->scan_null = result->scan_default = false;
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Determine the offsets of datums matching the specified values using
+ * list partitioning.
+ *
+ * If special partitions (null and default) need to be scanned for given
+ * values, set scan_null and scan_default in result if present.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because list partitioning.
+ * 'value' contains the value to use for pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains list partitioning comparison function to be used to
+ * perform partition_list_bsearch
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ {
+ result->scan_null = true;
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions.
+ */
+ if (nvalues == 0)
+ {
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(partindices[off] >= 0);
+ result->datum_offsets = bms_make_singleton(off);
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partition satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Determine the offsets of datums matching the specified values using
+ * range partitioning.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'nvalues', if non-zero, should be <= context->partntts - 1
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains range partitioning comparison function to be used to
+ * perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ /*
+ * If there are no datums to compare keys with, or if we got a IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partitions. */
+ if (partindices[off + 1] >= 0)
+ {
+ result->datum_offsets = bms_make_singleton(off + 1);
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since only a prefix of keys is provided, we must find
+ * other datums in boundinfo that match the prefix.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+ minoff = off;
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+ maxoff = off + 1;
+ }
+ }
+ else if (off >= 0)
+ {
+ /*
+ * Look-up value falls in the range between datums in
+ * boundinfo. Since off would be the offset of the greatest
+ * bound that is <= look-up value, we treat off + 1 as the
+ * the index of the upper bound of the partition the look-up
+ * value would fall in.
+ */
+ if (partindices[off + 1] >= 0)
+ {
+ result->datum_offsets = bms_make_singleton(off + 1);
+ /*
+ * If the query does not constrain all key columns,
+ * add the default partition.
+ */
+ if (nvalues < partntts &&
+ partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+ /*
+ * Look-up value is smaller that all data, so only the default
+ * partition, if any, qualifies.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ /* none qualifies. */
+ else
+ return result;
+
+ if (partindices[minoff] < 0 &&
+ minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All partition bounds are greater than the key, so include
+ * all datums in the result.
+ */
+ off = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since only a prefix of keys is provided, we must find
+ * other datums in boundinfo that match the prefix.
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes all such datums
+ * in the result (that is, set minoff to the index of
+ * smallest such datum) or find the smallest one that's
+ * greater than the look-up value and set minoff to that.
+ */
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off++;
+ break;
+ }
+ off = nextoff;
+ }
+ }
+ /*
+ * Look-up value falls in the range between datums in
+ * boundinfo. Since off would be the offset of the greatest
+ * bound that is <= look-up value, we treat off + 1 as the
+ * the index of the upper bound of the partition the look-up
+ * value would fall in. Assume that one as the minimum
+ * datum to be included in the result.
+ */
+ else
+ off++;
+ }
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0)
+ {
+ /*
+ * Since only a prefix of keys is provided, we must find
+ * other datums in boundinfo that match the prefix.
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes all such datums
+ * in the result (that is, set maxoff to the index of
+ * greatest such datum) or find the greatest one that's
+ * smaller than the look-up value and set maxoff to that.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off--;
+ break;
+ }
+ off = nextoff;
+ }
+
+ off++;
+ }
+ /*
+ * Look-up value falls in the range between datums in
+ * boundinfo. Since off would be the offset of the greatest
+ * bound that is <= look-up value, we treat off + 1 as the
+ * the index of the upper bound of the partition the look-up
+ * value would fall in. Assume that one as the minimum
+ * datum to be included in the result, but only if the look-up
+ * keys is inclusive.
+ */
+ else if (!is_equal || inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * All partition bounds are greater than the key, so select
+ * none of the partitions, except the default.
+ */
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ }
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ }
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ int i;
+
+ /*
+ * If the query does not constrain all key columns, add the default
+ * partition.
+ */
+ if (nvalues < partnatts)
+ result->scan_default = true;
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ if (partindices[i] < 0)
+ result->scan_default = true;
+ }
+
+ if (minoff > maxoff)
+ return result;
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7293a60d7..025d0caf5b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2133,6 +2133,50 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepOpNe
+ */
+static PartitionPruneStepOpNe *
+_copyPartitionPruneStepOpNe(const PartitionPruneStepOpNe *from)
+{
+ PartitionPruneStepOpNe *newnode = makeNode(PartitionPruneStepOpNe);
+
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,6 +5068,15 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepOpNe:
+ retval = _copyPartitionPruneStepOpNe(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..94d57f1c17 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,26 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepOpNe:
+ {
+ PartitionPruneStepOpNe *nestep =
+ (PartitionPruneStepOpNe *) node;
+
+ if (walker((Node *) nestep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2952,32 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepOpNe:
+ {
+ PartitionPruneStepOpNe *nestep =
+ (PartitionPruneStepOpNe *) node;
+ PartitionPruneStepOpNe *newnode;
+
+ FLATCOPY(newnode, nestep, PartitionPruneStepOpNe);
+ MUTATE(newnode->exprs, nestep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..3fd3cadb01 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -867,6 +868,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -874,6 +877,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1121,6 +1138,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..95d0e44047
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1665 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to generate steps needed for partition pruning
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning", as all the needed
+ * information is statically available in the query being planned.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static Node *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static Node *generate_pruning_step_op_ne(GeneratePruningStepsContext *context,
+ List *exprs, List *cmpfns);
+static Node *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the global array context->steps and
+ * each one gets an identifier which is unique across all recursive
+ * invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. Caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *ne_clauses = NIL;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * indepdently, collect their step IDs to be stored in the combine
+ * step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps means the arg wasn't a clause matching
+ * this partition key. We cannot prune using such an
+ * arg. To indicate that to the pruning code, we must
+ * construct a PartitionPruneStepCombine and set the
+ * source_stepids to an empty List.
+ *
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_OR);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ result = lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_OR));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_OR));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ Assert(pc != NULL);
+ /*
+ * If the clause was one containing an operator named <>,
+ * we generate a special pruning steps designed to handle
+ * those, so collect it in a separate list.
+ */
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ {
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * Combine expressions from all <> operator clauses into one prune step.
+ */
+ if (ne_clauses != NIL)
+ {
+ List *exprs = NIL;
+ List *cmpfns = NIL;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+
+ exprs = lappend(exprs, pc->expr);
+ cmpfns = lappend_oid(cmpfns, pc->cmpfn);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_op_ne(context, exprs, cmpfns));
+ }
+
+ /*
+ * generate_opsteps set to false means no OpExprs were directly presemt in
+ * the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ nullkeys));
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ NULL));
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an AND combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_AND));
+ }
+
+ return result;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys would't
+ * be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys,
+ * which get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ }
+
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ return (PartitionPruneStep *)
+ generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_AND);
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ int procnum = (part_scheme->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+
+ cmpfn = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, procnum);
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ return list_make1(generate_pruning_step_op(context, step_opstrategy,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys));
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ result = lappend(result,
+ generate_pruning_step_op(context,
+ step_opstrategy,
+ step_exprs1,
+ step_cmpfns1,
+ step_nullkeys));
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Following functions generate pruning steps of various types. Each step
+ * that's created is added to a global context->steps and receive a globally
+ * unique identifier that's sourced from context->next_step_id.
+ */
+
+static Node *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+ opstep->opstrategy = opstrategy;
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (Node *) opstep;
+}
+
+static Node *
+generate_pruning_step_op_ne(GeneratePruningStepsContext *context,
+ List *exprs, List *cmpfns)
+{
+ PartitionPruneStepOpNe *nestep = makeNode(PartitionPruneStepOpNe);
+
+ nestep->step.step_id = context->next_step_id++;
+ nestep->exprs = exprs;
+ nestep->cmpfns = cmpfns;
+
+ context->steps = lappend(context->steps, nestep);
+
+ return (Node *) nestep;
+}
+
+static Node *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (Node *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 30459f7ba9..155be722f6 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..8981901272 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -74,4 +96,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 443de22704..df9a6ea669 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -192,6 +192,10 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepOpNe,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..a0345a0abf 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,100 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepOpNe - Information to prune using a set of mutually AND'd
+ * OpExpr clauses each containing a <> operator
+ *
+ * This is a special form of PartitionPruneStepOp, where each of the
+ * expressions in 'exprs' is compared using a <> operator. To prune a given
+ * partition, we must check if each of the values it allows matches the value
+ * of one of the expressions in 'exprs' using the corresponding comparison
+ * function in 'cmpfns'.
+ *
+ * Note: Since we must consider every possible value of the partition key a
+ * given partition may contain to be able to prune it using this step, we
+ * consider generating these only for list partitioned tables.
+ *----------
+ */
+typedef struct PartitionPruneStepOpNe
+{
+ PartitionPruneStep step;
+
+ List *exprs;
+ List *cmpfns;
+} PartitionPruneStepOpNe;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_OR,
+ COMBINE_AND
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f151646271..ed0a885370 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e2b90f3263..71c4d0a419 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -963,9 +965,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1007,24 +1011,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1034,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1098,11 +1087,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1110,13 +1101,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
-- pruning should work fine, because prefix of keys is available
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
@@ -1124,11 +1123,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1138,7 +1139,7 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p_default t2_2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-(16 rows)
+(18 rows)
-- pruning should work fine in this case, too.
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
@@ -1150,13 +1151,15 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-> Seq Scan on mc3p1 t2
Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
Filter: (a = 1)
-(12 rows)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
--
-- pruning with clauses containing <> operator
@@ -1271,22 +1274,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning with just both columns constrained
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1340,3 +1337,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 38b5f68658..86a3a3e7ce 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -237,3 +237,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 17bf55c1f5..b37524efa2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1584,6 +1585,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1596,6 +1598,11 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
+PartitionPruneStepOpNe
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
@@ -1749,6 +1756,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
--
2.11.0
v42-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v42-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 33ef7e15d17290831b309c97cbb5e66589af51e8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v42 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 99 ++++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 111 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 025d0caf5b..26c5281385 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2305,21 +2305,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5096,9 +5081,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 765b1be74b..164eff7363 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3187,9 +3177,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f61ae03ac5..9ce40ee3b3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2230,7 +2230,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2255,6 +2254,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2304,6 +2304,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2529,16 +2530,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4074,9 +4065,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3fd3cadb01..03b94f6593 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -877,6 +877,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1330,6 +1341,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1340,7 +1357,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1367,49 +1383,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1428,9 +1450,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 52c21e6870..ac39f4f547 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -612,7 +612,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -627,6 +626,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1169,12 +1169,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1246,10 +1246,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1481,6 +1483,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1580,6 +1591,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1587,7 +1613,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6114,65 +6140,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9d94..058fb24927 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index df9a6ea669..52dec2e5ef 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -265,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ed0a885370..b4219b2d57 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2127,27 +2132,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b37524efa2..68f8ef4c22 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1609,7 +1609,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 2018/03/28 18:29, Amit Langote wrote:
Attached is the updated set of patches, which contains other miscellaneous
changes such as updated comments, beside the main changes described above.
Sorry, one of those miscellaneous changes was a typo that would cause
compilation to fail... Sigh. Fixed in the updated version.
Thanks,
Amit
Attachments:
v43-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v43-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 17dd6b87944ef9b8f1788760f45c998879f4aa87 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v43 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0231f8bf7c..30459f7ba9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..f151646271 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v43-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v43-0002-Add-more-tests-for-partition-pruning.patchDownload
From eb46345476ef697b283729f49ea1ff8dc35baee7 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v43 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 255 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 86 ++++++++-
2 files changed, 339 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..e2b90f3263 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,257 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..38b5f68658 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,88 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because prefix of keys is available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- pruning should work fine in this case, too.
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- case for list partitioned table that's not root
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because only the 2nd column is constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning with just 1st column constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning with just both columns constrained
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause values require different parsupfunc to be used for
+-- comparison
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v43-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v43-0003-Faster-partition-pruning.patchDownload
From f01e35b324ea3edc212332c5189648acdfa27eb8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v43 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1099 ++++++++++++++++
src/backend/nodes/copyfuncs.c | 53 +
src/backend/nodes/nodeFuncs.c | 46 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1665 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 4 +
src/include/nodes/primnodes.h | 96 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 314 ++++-
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 8 +
18 files changed, 3380 insertions(+), 85 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..3c26daa098 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,18 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * The following struct describes the result of performing one
+ * PartitionPruneStep.
+ */
+typedef struct PruneStepResult
+{
+ Bitmapset *datum_offsets;
+
+ /* Set if we need to scan the default and/or the null partition, resp. */
+ bool scan_default;
+ bool scan_null;
+} PruneStepResult;
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -197,6 +209,26 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_base_step_ne(PartitionPruneContext *context,
+ PartitionPruneStepOpNe *nestep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static PruneStepResult *get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_partitions_for_keys_range(
+ PartitionPruneContext *context, int opstrategy,
+ Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1620,9 +1652,1076 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **step_results,
+ *last_step_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. Result of
+ * of applying all pruning steps is the value contained in the slot of the
+ * last pruning step.
+ */
+ step_results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ step_results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+ case T_PartitionPruneStepOpNe:
+ step_results[step->step_id] =
+ perform_pruning_base_step_ne(context,
+ (PartitionPruneStepOpNe *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ step_results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ step_results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know that offsets of all the datums whose
+ * corresponding partitions need to be in the result, including special
+ * null-accepting and default partitions. Collect the actual partition
+ * indexes now.
+ */
+ i = -1;
+ result = NULL;
+ last_step_result = step_results[num_steps - 1];
+ while ((i = bms_next_member(last_step_result->datum_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed. */
+ if (last_step_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (last_step_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ result = bms_add_member(result, context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /* There better be same number of expressions and compare functions. */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ partsupfunc[keyno] = context->partsupfunc[keyno];
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_partitions_for_keys_hash(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_LIST:
+ return get_partitions_for_keys_list(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_RANGE:
+ return get_partitions_for_keys_range(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ return NULL;
+ }
+}
+
+/*
+ * perform_pruning_base_step_ne
+ * Returns indexes of datums in context->boundinfo that are not contained
+ * in nestep->exprs
+ *
+ * If there is a special null-accepting partition that accepts no other datum,
+ * then it scan_null of the returned PruneStepResult will be set. Also,
+ * scan_default is always set in this case.
+ *
+ * Note that this pruning method only supports LIST partitioning.
+ */
+static PruneStepResult *
+perform_pruning_base_step_ne(PartitionPruneContext *context,
+ PartitionPruneStepOpNe *nestep)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ PruneStepResult *result;
+ ListCell *lc1;
+ ListCell *lc2;
+ Bitmapset *foundoffsets = NULL;
+
+ /*
+ * We can only do this exclusion for list partitions because that's the
+ * only case where we require all values to explicitly specified in the
+ * partition boundinfo.
+ */
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ /* There better be same number of expressions and compare functions. */
+ Assert(list_length(nestep->exprs) == list_length(nestep->cmpfns));
+ Assert(context->partnatts == 1);
+
+ /*
+ * Check if the datums in the query are in any of the partitions. If
+ * found, store their offsets in foundoffsets.
+ */
+ forboth(lc1, nestep->exprs, lc2, nestep->cmpfns)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ Oid cmpfn = lfirst_oid(lc2);
+ Datum datum;
+
+ /*
+ * Note that we're passing 0 for partkeyidx, because there
+ * can be only one partition key with list partitioning.
+ */
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ FmgrInfo partsupfunc;
+ int offset;
+ bool is_equal;
+ /*
+ * If this datum is not the same type as the partition
+ * key then we'll need to use the comparison function
+ * for that type. We'll need to lookup the FmgrInfo.
+ */
+ if (cmpfn != context->partsupfunc[0].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc);
+ else
+ partsupfunc = context->partsupfunc[0];
+
+ offset = partition_list_bsearch(&partsupfunc, partcollation,
+ boundinfo,
+ datum, &is_equal);
+ if (offset >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[offset] >= 0);
+ foundoffsets = bms_add_member(foundoffsets, offset);
+ }
+ }
+ }
+
+ /* All partitions apart from those in excluded_parts match */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ result->datum_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ result->datum_offsets = bms_del_members(result->datum_offsets,
+ foundoffsets);
+ /*
+ * Because the clauses from which these datums were extracted are all
+ * strict, we can also exclude the NULL (-only!) partition.
+ */
+ if (partition_bound_accepts_nulls(boundinfo) &&
+ !bms_is_member(boundinfo->null_index, result->datum_offsets))
+ result->scan_null = true;
+ /* Always scan the default partition. */
+ result->scan_default = true;
+
+ return result;
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+
+ /*
+ * In some cases, planner generates a combine step that doesn't contain
+ * any argument steps, to signal us to not prune any partitions. So,
+ * return indexes of all datums in that case, including null and/or
+ * default partition, if any.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (cstep->source_stepids == NIL)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->datum_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+ else
+ {
+ bool firststep = true;
+
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain valid result, which is
+ * confirmed by the fact that cstep's ID is greater than
+ * step_id.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ switch (cstep->combineOp)
+ {
+ case COMBINE_OR:
+ /*
+ * Add indexes of datums given by the argument step's
+ * result.
+ */
+ result->datum_offsets =
+ bms_add_members(result->datum_offsets,
+ step_result->datum_offsets);
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ break;
+
+ case COMBINE_AND:
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->datum_offsets = step_result->datum_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /*
+ * Only keep indexes of datums that are in argument
+ * step's result.
+ */
+ result->datum_offsets =
+ bms_int_members(result->datum_offsets,
+ step_result->datum_offsets);
+ /*
+ * Update whether to scan null and default partitions.
+ */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default =
+ step_result->scan_default;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+ }
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_partitions_for_keys_hash
+ * Determine the offset of datum in context->boundinfo that matches
+ * the key per hash partitioning
+ *
+ * Since there are neither of the special partitions (null and default) in
+ * case of hash partitioning, scan_null and scan_default are not set.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_partitions_for_keys_hash(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we have such clauses for all keys, which the planner must have
+ * found or we wouldn't have gotten here.
+ */
+ Assert(nvalues + bms_num_members(nullkeys) == partnatts);
+
+ /*
+ * If there are any values, they must have come from clauses containing
+ * an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ result->datum_offsets = bms_make_singleton(rowHash % greatest_modulus);
+ result->scan_null = result->scan_default = false;
+
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_list
+ * Determine the offsets of datums matching the specified values using
+ * list partitioning.
+ *
+ * If special partitions (null and default) need to be scanned for given
+ * values, set scan_null and scan_default in result if present.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because list partitioning.
+ * 'value' contains the value to use for pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains list partitioning comparison function to be used to
+ * perform partition_list_bsearch
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_partitions_for_keys_list(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ {
+ result->scan_null = true;
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions.
+ */
+ if (nvalues == 0)
+ {
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(partindices[off] >= 0);
+ result->datum_offsets = bms_make_singleton(off);
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partition satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
+ * get_partitions_for_keys_range
+ * Determine the offsets of datums matching the specified values using
+ * range partitioning.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'nvalues', if non-zero, should be <= context->partnatts - 1
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+ * 'opstrategy' if non-zero must be a btree strategy number
+ * 'partsupfunc' contains range partitioning comparison function to be used to
+ * perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_partitions_for_keys_range(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ /*
+ * If there are no datums to compare keys with, or if we got a IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partitions. */
+ if (partindices[off + 1] >= 0)
+ {
+ result->datum_offsets = bms_make_singleton(off + 1);
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since only a prefix of keys is provided, we must find
+ * other datums in boundinfo that match the prefix.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+ minoff = off;
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+ maxoff = off + 1;
+ }
+ }
+ else if (off >= 0)
+ {
+ /*
+ * Look-up value falls in the range between datums in
+ * boundinfo. Since off would be the offset of the greatest
+ * bound that is <= look-up value, we treat off + 1 as the
+ * the index of the upper bound of the partition the look-up
+ * value would fall in.
+ */
+ if (partindices[off + 1] >= 0)
+ {
+ result->datum_offsets = bms_make_singleton(off + 1);
+ /*
+ * If the query does not constrain all key columns,
+ * add the default partition.
+ */
+ if (nvalues < partnatts &&
+ partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+ /*
+ * Look-up value is smaller that all data, so only the default
+ * partition, if any, qualifies.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ /* none qualifies. */
+ else
+ return result;
+
+ if (partindices[minoff] < 0 &&
+ minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All partition bounds are greater than the key, so include
+ * all datums in the result.
+ */
+ off = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since only a prefix of keys is provided, we must find
+ * other datums in boundinfo that match the prefix.
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes all such datums
+ * in the result (that is, set minoff to the index of
+ * smallest such datum) or find the smallest one that's
+ * greater than the look-up value and set minoff to that.
+ */
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off++;
+ break;
+ }
+ off = nextoff;
+ }
+ }
+ /*
+ * Look-up value falls in the range between datums in
+ * boundinfo. Since off would be the offset of the greatest
+ * bound that is <= look-up value, we treat off + 1 as the
+ * the index of the upper bound of the partition the look-up
+ * value would fall in. Assume that one as the minimum
+ * datum to be included in the result.
+ */
+ else
+ off++;
+ }
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0)
+ {
+ /*
+ * Since only a prefix of keys is provided, we must find
+ * other datums in boundinfo that match the prefix.
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes all such datums
+ * in the result (that is, set maxoff to the index of
+ * greatest such datum) or find the greatest one that's
+ * smaller than the look-up value and set maxoff to that.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ {
+ if (!inclusive)
+ off--;
+ break;
+ }
+ off = nextoff;
+ }
+
+ off++;
+ }
+ /*
+ * Look-up value falls in the range between datums in
+ * boundinfo. Since off would be the offset of the greatest
+ * bound that is <= look-up value, we treat off + 1 as the
+ * the index of the upper bound of the partition the look-up
+ * value would fall in. Assume that one as the minimum
+ * datum to be included in the result, but only if the look-up
+ * keys is inclusive.
+ */
+ else if (!is_equal || inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * All partition bounds are greater than the key, so select
+ * none of the partitions, except the default.
+ */
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 && minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ }
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 && maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ }
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ int i;
+
+ /*
+ * If the query does not constrain all key columns, add the default
+ * partition.
+ */
+ if (nvalues < partnatts)
+ result->scan_default = true;
+
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ if (partindices[i] < 0)
+ result->scan_default = true;
+ }
+
+ if (minoff > maxoff)
+ return result;
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7293a60d7..025d0caf5b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2133,6 +2133,50 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepOpNe
+ */
+static PartitionPruneStepOpNe *
+_copyPartitionPruneStepOpNe(const PartitionPruneStepOpNe *from)
+{
+ PartitionPruneStepOpNe *newnode = makeNode(PartitionPruneStepOpNe);
+
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,6 +5068,15 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepOpNe:
+ retval = _copyPartitionPruneStepOpNe(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..94d57f1c17 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,26 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepOpNe:
+ {
+ PartitionPruneStepOpNe *nestep =
+ (PartitionPruneStepOpNe *) node;
+
+ if (walker((Node *) nestep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2952,32 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepOpNe:
+ {
+ PartitionPruneStepOpNe *nestep =
+ (PartitionPruneStepOpNe *) node;
+ PartitionPruneStepOpNe *newnode;
+
+ FLATCOPY(newnode, nestep, PartitionPruneStepOpNe);
+ MUTATE(newnode->exprs, nestep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..3fd3cadb01 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -867,6 +868,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -874,6 +877,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1121,6 +1138,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..95d0e44047
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1665 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to generate steps needed for partition pruning
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning", as all the needed
+ * information is statically available in the query being planned.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static Node *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static Node *generate_pruning_step_op_ne(GeneratePruningStepsContext *context,
+ List *exprs, List *cmpfns);
+static Node *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the global array context->steps and
+ * each one gets an identifier which is unique across all recursive
+ * invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. Caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *ne_clauses = NIL;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * indepdently, collect their step IDs to be stored in the combine
+ * step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps means the arg wasn't a clause matching
+ * this partition key. We cannot prune using such an
+ * arg. To indicate that to the pruning code, we must
+ * construct a PartitionPruneStepCombine and set the
+ * source_stepids to an empty List.
+ *
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_OR);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ result = lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_OR));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_OR));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ Assert(pc != NULL);
+ /*
+ * If the clause was one containing an operator named <>,
+ * we generate a special pruning steps designed to handle
+ * those, so collect it in a separate list.
+ */
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ {
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * Combine expressions from all <> operator clauses into one prune step.
+ */
+ if (ne_clauses != NIL)
+ {
+ List *exprs = NIL;
+ List *cmpfns = NIL;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+
+ exprs = lappend(exprs, pc->expr);
+ cmpfns = lappend_oid(cmpfns, pc->cmpfn);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_op_ne(context, exprs, cmpfns));
+ }
+
+ /*
+ * generate_opsteps set to false means no OpExprs were directly presemt in
+ * the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ nullkeys));
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ NULL));
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an AND combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_AND));
+ }
+
+ return result;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys would't
+ * be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys,
+ * which get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ }
+
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ return (PartitionPruneStep *)
+ generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_AND);
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ int procnum = (part_scheme->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+
+ cmpfn = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, procnum);
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ return list_make1(generate_pruning_step_op(context, step_opstrategy,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys));
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ result = lappend(result,
+ generate_pruning_step_op(context,
+ step_opstrategy,
+ step_exprs1,
+ step_cmpfns1,
+ step_nullkeys));
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Following functions generate pruning steps of various types. Each step
+ * that's created is added to a global context->steps and receive a globally
+ * unique identifier that's sourced from context->next_step_id.
+ */
+
+static Node *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+ opstep->opstrategy = opstrategy;
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (Node *) opstep;
+}
+
+static Node *
+generate_pruning_step_op_ne(GeneratePruningStepsContext *context,
+ List *exprs, List *cmpfns)
+{
+ PartitionPruneStepOpNe *nestep = makeNode(PartitionPruneStepOpNe);
+
+ nestep->step.step_id = context->next_step_id++;
+ nestep->exprs = exprs;
+ nestep->cmpfns = cmpfns;
+
+ context->steps = lappend(context->steps, nestep);
+
+ return (Node *) nestep;
+}
+
+static Node *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (Node *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 30459f7ba9..155be722f6 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..8981901272 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -74,4 +96,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 443de22704..df9a6ea669 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -192,6 +192,10 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepOpNe,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..a0345a0abf 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,100 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepOpNe - Information to prune using a set of mutually AND'd
+ * OpExpr clauses each containing a <> operator
+ *
+ * This is a special form of PartitionPruneStepOp, where each of the
+ * expressions in 'exprs' is compared using a <> operator. To prune a given
+ * partition, we must check if each of the values it allows matches the value
+ * of one of the expressions in 'exprs' using the corresponding comparison
+ * function in 'cmpfns'.
+ *
+ * Note: Since we must consider every possible value of the partition key a
+ * given partition may contain to be able to prune it using this step, we
+ * consider generating these only for list partitioned tables.
+ *----------
+ */
+typedef struct PartitionPruneStepOpNe
+{
+ PartitionPruneStep step;
+
+ List *exprs;
+ List *cmpfns;
+} PartitionPruneStepOpNe;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_OR,
+ COMBINE_AND
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f151646271..ed0a885370 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e2b90f3263..71c4d0a419 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -963,9 +965,11 @@ explain (costs off) select * from mc2p where a = 2 and b < 1;
QUERY PLAN
---------------------------------------
Append
+ -> Seq Scan on mc2p2
+ Filter: ((b < 1) AND (a = 2))
-> Seq Scan on mc2p3
Filter: ((b < 1) AND (a = 2))
-(3 rows)
+(5 rows)
explain (costs off) select * from mc2p where a > 1;
QUERY PLAN
@@ -1007,24 +1011,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1034,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1098,11 +1087,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1110,13 +1101,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(30 rows)
-- pruning should work fine, because prefix of keys is available
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
@@ -1124,11 +1123,13 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-----------------------------------------------------------------------
Nested Loop
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_3
Filter: (a = 1)
-> Aggregate
-> Append
@@ -1138,7 +1139,7 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p_default t2_2
Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
-(16 rows)
+(18 rows)
-- pruning should work fine in this case, too.
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
@@ -1150,13 +1151,15 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
-> Seq Scan on mc3p1 t2
Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
-> Append
- -> Seq Scan on mc2p1 t1
+ -> Seq Scan on mc2p0 t1
Filter: (a = 1)
- -> Seq Scan on mc2p2 t1_1
+ -> Seq Scan on mc2p1 t1_1
Filter: (a = 1)
- -> Seq Scan on mc2p_default t1_2
+ -> Seq Scan on mc2p2 t1_2
Filter: (a = 1)
-(12 rows)
+ -> Seq Scan on mc2p_default t1_3
+ Filter: (a = 1)
+(14 rows)
--
-- pruning with clauses containing <> operator
@@ -1271,22 +1274,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning with just both columns constrained
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1340,3 +1337,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 38b5f68658..86a3a3e7ce 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -237,3 +237,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 17bf55c1f5..b37524efa2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1584,6 +1585,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1596,6 +1598,11 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
+PartitionPruneStepOpNe
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
@@ -1749,6 +1756,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
--
2.11.0
v43-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v43-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From d21007bc90fbbaffcdc62079a078e351c297ed05 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v43 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 99 ++++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 111 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 025d0caf5b..26c5281385 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2305,21 +2305,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5096,9 +5081,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 765b1be74b..164eff7363 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3187,9 +3177,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f61ae03ac5..9ce40ee3b3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2230,7 +2230,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2255,6 +2254,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2304,6 +2304,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2529,16 +2530,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4074,9 +4065,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3fd3cadb01..03b94f6593 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -877,6 +877,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1330,6 +1341,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1340,7 +1357,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1367,49 +1383,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1428,9 +1450,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 52c21e6870..ac39f4f547 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -612,7 +612,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -627,6 +626,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1169,12 +1169,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1246,10 +1246,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1481,6 +1483,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1580,6 +1591,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1587,7 +1613,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6114,65 +6140,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9d94..058fb24927 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index df9a6ea669..52dec2e5ef 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -265,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ed0a885370..b4219b2d57 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2127,27 +2132,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b37524efa2..68f8ef4c22 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1609,7 +1609,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 28 March 2018 at 22:29, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Also, I have redesigned how we derive partition indexes after running
pruning steps. Previously, for each step we'd determine the indexes of
"partitions" that are not pruned leading to a list partition not being
pruned sometimes, as shown in the two recent examples. Instead, in the
new approach, we only keep track of the indexes of the "datums" that
satisfy individual pruning steps (both base pruning steps and combine
steps) and only figure out the partition indexes after we've determined
set of datums that survive all pruning steps. That is, after we're done
executing all pruning steps. Whether we need to scan special partitions
like null-only and default partition is tracked along with datum indexes
for each step. With this change, pruning works as expected in both examples:
Smart thinking! Good to see that solved.
I'll try to look at v43 during my working day tomorrow, in around 9 hours time.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On 03/28/2018 06:30 AM, Amit Langote wrote:
On 2018/03/28 18:29, Amit Langote wrote:
Attached is the updated set of patches, which contains other miscellaneous
changes such as updated comments, beside the main changes described above.Sorry, one of those miscellaneous changes was a typo that would cause
compilation to fail... Sigh. Fixed in the updated version.
Just some trivial changes.
However,
explain (costs off) select * from mc2p where a = 2 and b < 1;
is picking up
-> Seq Scan on mc2p2
Filter: ((b < 1) AND (a = 2))
which doesn't seem right, as its definition is
create table mc2p2 partition of mc2p for values from (1, 1) to (2,
minvalue);
Best regards,
Jesper
Attachments:
delta_v43.patchtext/x-patch; name=delta_v43.patchDownload
From 8bb5f25d31471910db2e7907b4c13029edd7bbdf Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Wed, 28 Mar 2018 12:39:42 -0400
Subject: [PATCH] Trivial changes
---
src/backend/catalog/partition.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 3c26daa098..2ee5ce605c 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1774,7 +1774,7 @@ perform_pruning_base_step(PartitionPruneContext *context,
Datum values[PARTITION_MAX_KEYS];
FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
- /* There better be same number of expressions and compare functions. */
+ /* There better be the same number of expressions and compare functions. */
Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
nvalues = 0;
@@ -1783,7 +1783,7 @@ perform_pruning_base_step(PartitionPruneContext *context,
/*
* Generate the partition look-up key that will be used by one of
- * get_partitions_from_keys_* functions called below.
+ * the get_partitions_from_keys_* functions called below.
*/
for (keyno = 0; keyno < context->partnatts; keyno++)
{
@@ -1969,7 +1969,7 @@ perform_pruning_combine_step(PartitionPruneContext *context,
PruneStepResult *result = NULL;
/*
- * In some cases, planner generates a combine step that doesn't contain
+ * In some cases, the planner generates a combine step that doesn't contain
* any argument steps, to signal us to not prune any partitions. So,
* return indexes of all datums in that case, including null and/or
* default partition, if any.
@@ -1994,7 +1994,7 @@ perform_pruning_combine_step(PartitionPruneContext *context,
PruneStepResult *step_result;
/*
- * step_results[step_id] must contain valid result, which is
+ * step_results[step_id] must contain a valid result, which is
* confirmed by the fact that cstep's ID is greater than
* step_id.
*/
@@ -2147,10 +2147,10 @@ get_partitions_for_keys_hash(PartitionPruneContext *context,
* If special partitions (null and default) need to be scanned for given
* values, set scan_null and scan_default in result if present.
*
- * 'nvalues', if non-zero, should be exactly 1, because list partitioning.
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
* 'value' contains the value to use for pruning
* 'opstrategy' if non-zero must be a btree strategy number
- * 'partsupfunc' contains list partitioning comparison function to be used to
+ * 'partsupfunc' contains the list partitioning comparison function to be used to
* perform partition_list_bsearch
* 'nullkeys' is the set of partition keys that are null.
*/
@@ -2161,7 +2161,6 @@ get_partitions_for_keys_list(PartitionPruneContext *context,
{
PruneStepResult *result;
PartitionBoundInfo boundinfo = context->boundinfo;
- int *partindices = boundinfo->indexes;
int off,
minoff,
maxoff;
@@ -2272,7 +2271,7 @@ get_partitions_for_keys_list(PartitionPruneContext *context,
{
/*
* This case means all partition bounds are greater, which in
- * turn means that all partition satisfy this key.
+ * turn means that all partitions satisfy this key.
*/
off = 0;
}
@@ -2333,7 +2332,7 @@ get_partitions_for_keys_list(PartitionPruneContext *context,
* 'values' contains values for partition keys (or a prefix) to be used for
* pruning
* 'opstrategy' if non-zero must be a btree strategy number
- * 'partsupfunc' contains range partitioning comparison function to be used to
+ * 'partsupfunc' contains the range partitioning comparison function to be used to
* perform partition_range_datum_bsearch or partition_rbound_datum_cmp
* 'nullkeys' is the set of partition keys that are null.
*/
@@ -2404,7 +2403,7 @@ get_partitions_for_keys_range(PartitionPruneContext *context,
{
if (nvalues == partnatts)
{
- /* There can only be zero or one matching partitions. */
+ /* There can only be zero or one matching partition. */
if (partindices[off + 1] >= 0)
{
result->datum_offsets = bms_make_singleton(off + 1);
@@ -2532,7 +2531,7 @@ get_partitions_for_keys_range(PartitionPruneContext *context,
* Since only a prefix of keys is provided, we must find
* other datums in boundinfo that match the prefix.
* Based on whether the look-up values is inclusive or
- * not, we must either include the indexes all such datums
+ * not, we must either include the indexes of all such datums
* in the result (that is, set minoff to the index of
* smallest such datum) or find the smallest one that's
* greater than the look-up value and set minoff to that.
--
2.13.6
On 2018/03/29 1:41, Jesper Pedersen wrote:
Just some trivial changes.
Thanks Jesper. Merged.
However,
explain (costs off) select * from mc2p where a = 2 and b < 1;
is picking up
-> Seq Scan on mc2p2
Filter: ((b < 1) AND (a = 2))which doesn't seem right, as its definition is
create table mc2p2 partition of mc2p for values from (1, 1) to (2, minvalue);
Yeah, that wasn't right. It boiled down to how some code in the range
partition pruning function considered a tuple containing a = 2 to fall in
this partition, which is wrong because the minvalue in its upper bound
makes the partition exclusive of any tuples with a = 2. Fixed that.
Beside fixing that, I have decided to get rid of the
PartititionPruneStepOpNe (a special kind of base pruning step that was
being used to prune list partitions using a set of <> operator clauses)
and related functions. Instead pruning for <> operator clauses is now
implemented by using a combination of PartitionPruneStepOp and
PartitionPruneStepCombine after adding a new combine op COMBINE_INVERT (I
also renamed COMBINE_OR and COMBINE_AND to COMBINE_UNION and
COMBINE_INTERSECT, respectively). I decided to do so because the previous
arrangement looked like a "hack" to support a special case that touched no
less than quite a few places.
Attached find the updated version of patches.
Thanks,
Amit
Attachments:
v44-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v44-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 7f61cd8c1c43b0f6f40d1a63fd2f874806582084 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v44 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0231f8bf7c..30459f7ba9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..f151646271 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v44-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v44-0002-Add-more-tests-for-partition-pruning.patchDownload
From d1ba19704d115b4d6e17bda17b5118a152d0bb26 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v44 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 258 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 88 ++++++++-
2 files changed, 344 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..a0edba291f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,260 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..d2b4561530 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,90 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v44-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v44-0003-Faster-partition-pruning.patchDownload
From 2bd9b37a5576f54d213699f53dacfb0841f1a7fe Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v44 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1144 +++++++++++++++++
src/backend/nodes/copyfuncs.c | 36 +
src/backend/nodes/nodeFuncs.c | 25 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1694 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 3 +
src/include/nodes/primnodes.h | 74 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 282 +++-
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 7 +
18 files changed, 3372 insertions(+), 73 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..363b12836b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,18 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * The following struct describes the result of performing one
+ * PartitionPruneStep.
+ */
+typedef struct PruneStepResult
+{
+ Bitmapset *datum_offsets;
+
+ /* Set if we need to scan the default and/or the null partition, resp. */
+ bool scan_default;
+ bool scan_null;
+} PruneStepResult;
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -197,6 +209,23 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static PruneStepResult *get_matching_hash_bound(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1620,9 +1649,1124 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **step_results,
+ *last_step_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. Result of
+ * of applying all pruning steps is the value contained in the slot of the
+ * last pruning step.
+ */
+ step_results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ step_results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ step_results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ step_results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know that offsets of all the datums whose
+ * corresponding partitions need to be in the result, including special
+ * null-accepting and default partitions. Collect the actual partition
+ * indexes now.
+ */
+ i = -1;
+ result = NULL;
+ last_step_result = step_results[num_steps - 1];
+ while ((i = bms_next_member(last_step_result->datum_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed and if present. */
+ if (last_step_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ if (partition_bound_accepts_nulls(context->boundinfo))
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (last_step_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ if (partition_bound_has_default(context->boundinfo))
+ result = bms_add_member(result,
+ context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /*
+ * There better be the same number of expressions and compare functions.
+ */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * the get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ partsupfunc[keyno] = context->partsupfunc[keyno];
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_matching_hash_bound(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_LIST:
+ return get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_RANGE:
+ return get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ return NULL;
+ }
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+
+ /*
+ * In some cases, the planner generates a combine step that doesn't
+ * contain any argument steps, to signal us to not prune any partitions.
+ * So, return indexes of all datums in that case, including null and/or
+ * default partition, if any.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (cstep->source_stepids == NIL)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->datum_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+ else
+ {
+ bool firststep;
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result,
+ * which is confirmed by the fact that cstep's step_id is
+ * greater than step_id.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ /*
+ * Add indexes of datums given by the argument step's
+ * result.
+ */
+ result->datum_offsets =
+ bms_add_members(result->datum_offsets,
+ step_result->datum_offsets);
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
+
+ case COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->datum_offsets = step_result->datum_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /*
+ * Only keep indexes of datums that are in argument
+ * step's result.
+ */
+ result->datum_offsets =
+ bms_int_members(result->datum_offsets,
+ step_result->datum_offsets);
+ /*
+ * Update whether to scan null and default partitions.
+ */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default =
+ step_result->scan_default;
+ }
+ }
+ break;
+
+ case COMBINE_INVERT:
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int source_step_id;
+ PruneStepResult *source;
+
+ /*
+ * There should only ever be one source step to invert the
+ * result of.
+ */
+ Assert(list_length(cstep->source_stepids) == 1);
+ source_step_id = linitial_int(cstep->source_stepids);
+ if (source_step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ source = step_results[source_step_id];
+ Assert(source != NULL);
+
+ /* First add possible datum offsets. */
+ result->datum_offsets =
+ bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ /* Then remove from it those present in the source. */
+ result->datum_offsets =
+ bms_del_members(result->datum_offsets,
+ source->datum_offsets);
+ Assert(!source->scan_null ||
+ partition_bound_accepts_nulls(boundinfo));
+ result->scan_null = !source->scan_null;
+ Assert(!source->scan_default ||
+ partition_bound_has_default(boundinfo));
+ result->scan_default = !source->scan_default;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_matching_hash_bound
+ * Determine offset of the hash bound matching the specified value,
+ * considering that all the non-null values come from clauses containing
+ * a compatible hash eqaulity operator and any keys that are null come
+ * from a IS NULL clause
+ *
+ * In most cases, the result would contain just one bound's offset, although
+ * the set may be empty if the corresponding hash partition has not been
+ * created.
+ *
+ * Since there are neither of the special partitions (null and default) in
+ * case of hash partitioning, scan_null and scan_default are not set.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_hash_bound(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we have such clauses for all keys, which the planner must have
+ * found or we wouldn't have gotten here.
+ */
+ Assert(nvalues + bms_num_members(nullkeys) == partnatts);
+
+ /*
+ * If there are any values, they must have come from clauses containing
+ * an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ result->datum_offsets = bms_make_singleton(rowHash % greatest_modulus);
+ result->scan_null = result->scan_default = false;
+
+ return result;
+}
+
+/*
+ * get_matching_list_bounds
+ * Determine the offsets of list bounds matching the specified value,
+ * according to the semantics of the given operator strategy
+ *
+ * If special partitions (null and default) need to be scanned for given
+ * values, set scan_null and scan_default in result if present.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
+
+ * 'value' contains the value to use for pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the list partitioning comparison function to be used
+ * to perform partition_list_bsearch
+
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ {
+ result->scan_null = true;
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions.
+ */
+ if (nvalues == 0)
+ {
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[off] >= 0);
+ result->datum_offsets = bms_make_singleton(off);
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partitions satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
+ * get_matching_range_datums
+ * Determine the offsets of range bounds matching the specified values,
+ * according to the semantics of the given operator strategy
+ *
+ * Each datum whose offset is in result is to be treated as the upper bound of
+ * the partition that will contain the desired values.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'nvalues', if non-zero, should be <= context->partnatts - 1
+
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the range partitioning comparison functions to be
+ * used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * using.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(nvalues <= partnatts);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ /*
+ * If there are no datums to compare keys with, or if we got a IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+
+ /*
+ * If the query does not constrain all key columns, we'll need to scan the
+ * the default partition, if any.
+ */
+ if (nvalues < partnatts && partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ /*
+ * Look for the smallest bound that is = look-up value.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partition. */
+ if (partindices[off + 1] >= 0)
+ {
+ result->datum_offsets = bms_make_singleton(off + 1);
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ */
+
+ /*
+ * First find greatest bound that's smaller than the
+ * look-up value.
+ */
+ while (off >= 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+ /*
+ * We can treat off as the offset of the smallest bound to
+ * be included in the result, if we know it is the upper
+ * bound of the partition in which the look-up value could
+ * possibly exist. One case it couldn't is if the bound,
+ * or precisely the matched portion of its prefix, is not
+ * inclusive.
+ */
+ if (boundinfo->kind[off][nvalues] ==
+ PARTITION_RANGE_DATUM_MINVALUE)
+ off++;
+
+ minoff = off;
+
+ /*
+ * Now find smallest bound that's greater than the look-up
+ * value.
+ */
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * off + 1, then would be the offset of the greatest bound
+ * to be included in the result.
+ */
+ maxoff = off + 1;
+ }
+
+ /*
+ * Skip if minoff/maxoff are actually the upper bound of a
+ * un-assigned portion of values.
+ */
+ if (partindices[minoff] < 0 && minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ for (i = minoff; i <= maxoff; i++)
+ if (partindices[i] < 0)
+ result->scan_default = true;
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+ else if (off >= 0) /* !is_equal */
+ {
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * only partition that may contain the look-up value.
+ */
+ if (partindices[off + 1] >= 0)
+ {
+ result->datum_offsets = bms_make_singleton(off + 1);
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+ /*
+ * off < 0, meaning the look-up value is smaller that all bounds,
+ * so only the default partition, if any, qualifies.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ /* none qualifies. */
+ else
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ /*
+ * Look for the smallest bound that is > or >= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the look-up value, so include
+ * all of them in the result.
+ */
+ minoff = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ *
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes of all such
+ * bounds in the result (that is, set minoff to the index
+ * of smallest such bound) or find the smallest one that's
+ * greater than the look-up value and set minoff to that.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ minoff = inclusive ? off : off + 1;
+ }
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * smallest partition that may contain the look-up value.
+ */
+ else
+ minoff = off + 1;
+ }
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ /*
+ * Look for the greatest bound that is < or <= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the key, so we could only
+ * expect to find the look-up key in the default partition.
+ */
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+ else
+ {
+ /*
+ * See the comment above.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ maxoff = inclusive ? off + 1: off;
+ }
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * greatest partition that may contain look-up value. If
+ * the look-up value had exactly matched the bound, but it
+ * isn't inclusive, no need add the adjacent partition.
+ */
+ else if (!is_equal || inclusive)
+ maxoff = off + 1;
+ else
+ maxoff = off;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition, as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ }
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ }
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ if (partindices[i] < 0)
+ result->scan_default = true;
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (minoff > maxoff)
+ return result;
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7293a60d7..d0ab4273c8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2133,6 +2133,36 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,6 +5054,12 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..52de893e89 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,17 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2943,20 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..3fd3cadb01 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -867,6 +868,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -874,6 +877,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1121,6 +1138,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..48fdeacac3
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1694 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to generate steps needed for partition pruning
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning", as all the needed
+ * information is statically available in the query being planned.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static Node *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static Node *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the global array context->steps and
+ * each one gets an identifier which is unique across all recursive
+ * invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. Caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *ne_clauses = NIL;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * indepdently, collect their step IDs to be stored in the combine
+ * step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps means the arg wasn't a clause matching
+ * this partition key. We cannot prune using such an
+ * arg. To indicate that to the pruning code, we must
+ * construct a PartitionPruneStepCombine and set the
+ * source_stepids to an empty List.
+ *
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_UNION);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ Assert(pc != NULL);
+ /*
+ * If the clause was one containing an operator named <>,
+ * we generate a special pruning steps designed to handle
+ * those, so collect it in a separate list.
+ */
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ {
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * Combine expressions from all <> operator clauses into one prune step.
+ * What we do is we convert what would originally be:
+ *
+ * ne_clause1 AND ne_clause2 .. AND ne_clauseN
+ *
+ * into:
+ *
+ * NOT (eq_clause1 OR eq_clause2 .. OR eq_clauseN)
+ *
+ * where each of the eq_clauses are constructed with valid negator of the
+ * <> operator appearing in corresponding ne_clauses.
+ */
+ if (ne_clauses != NIL)
+ {
+ List *step_ids = NIL;
+ PartitionPruneStep *unionStep,
+ *diffStep,
+ *nullpartStep;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStep *step;
+
+ /*
+ * Generate a opstep using what must be a btree = operator, that
+ * is, the negator of <> originally appearing in the clause.
+ */
+ step = (PartitionPruneStep *)
+ generate_pruning_step_op(context,
+ BTEqualStrategyNumber,
+ list_make1(pc->expr),
+ list_make1_oid(pc->cmpfn),
+ NULL);
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ /*
+ * Moreover, we must add an explicit step as an argument of the union
+ * step being built to select the NULL-only partition (if any), so
+ * that it is excluded from the final result by subsequent inversion.
+ * That's because all these <> clauses are strict and hence won't
+ * select any records of the NULL-only partition.
+ */
+ Assert(part_scheme->partnatts == 1);
+ nullpartStep = (PartitionPruneStep *)
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ bms_make_singleton(0));
+ step_ids = lappend_int(step_ids, nullpartStep->step_id);
+
+ /* Combine all opsteps above using a UNION combine step first. */
+ unionStep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_UNION);
+ /* Then slap on a DIFFERENCE combine step. */
+ diffStep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context,
+ list_make1_int(unionStep->step_id),
+ COMBINE_INVERT);
+
+ result = lappend(result, diffStep);
+ }
+
+ /*
+ * generate_opsteps set to false means no OpExprs were directly presemt in
+ * the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ nullkeys));
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ NULL));
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an AND combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_INTERSECT));
+ }
+
+ return result;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys would't
+ * be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys,
+ * which get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ }
+
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ return (PartitionPruneStep *)
+ generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_INTERSECT);
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ int procnum = (part_scheme->strategy == PARTITION_STRATEGY_HASH)
+ ? HASHEXTENDED_PROC
+ : BTORDER_PROC;
+
+ cmpfn = get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, procnum);
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ return list_make1(generate_pruning_step_op(context, step_opstrategy,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys));
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ result = lappend(result,
+ generate_pruning_step_op(context,
+ step_opstrategy,
+ step_exprs1,
+ step_cmpfns1,
+ step_nullkeys));
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Following functions generate pruning steps of various types. Each step
+ * that's created is added to a global context->steps and receive a globally
+ * unique identifier that's sourced from context->next_step_id.
+ */
+
+static Node *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+ opstep->opstrategy = opstrategy;
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (Node *) opstep;
+}
+
+static Node *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (Node *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 30459f7ba9..155be722f6 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..8981901272 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -74,4 +96,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 443de22704..adb0d3a45f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -192,6 +192,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..a71d729e72 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,78 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_UNION,
+ COMBINE_INTERSECT,
+ COMBINE_INVERT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f151646271..ed0a885370 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index a0edba291f..0be31cce7e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -1007,24 +1009,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1032,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1111,13 +1098,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(28 rows)
-- pruning should work fine, because values for a prefix of keys (a, b) are
-- available
@@ -1275,22 +1270,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning, with values provided for both keys
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1343,3 +1332,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d2b4561530..8377671cde 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -239,3 +239,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 17bf55c1f5..5006babc6c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1584,6 +1585,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1596,6 +1598,10 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
@@ -1749,6 +1755,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
--
2.11.0
v44-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v44-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From fa9b3e28a98666a067dbd5e09a27d07e6762c3b8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v44 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 99 ++++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 111 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d0ab4273c8..04a7e1aa62 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2291,21 +2291,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5079,9 +5064,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 765b1be74b..164eff7363 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3187,9 +3177,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f61ae03ac5..9ce40ee3b3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2230,7 +2230,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2255,6 +2254,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2304,6 +2304,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2529,16 +2530,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4074,9 +4065,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3fd3cadb01..03b94f6593 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -877,6 +877,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1330,6 +1341,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1340,7 +1357,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1367,49 +1383,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1428,9 +1450,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a19f5d0c02..0fedb84ac9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -615,7 +615,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -630,6 +629,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1172,12 +1172,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1249,10 +1249,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1484,6 +1486,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1583,6 +1594,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1590,7 +1616,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6117,65 +6143,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9d94..058fb24927 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index adb0d3a45f..e6b5770c74 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,7 +264,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ed0a885370..b4219b2d57 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2127,27 +2132,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5006babc6c..d9dd2209dc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1608,7 +1608,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 29 March 2018 at 21:35, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Beside fixing that, I have decided to get rid of the
PartititionPruneStepOpNe (a special kind of base pruning step that was
being used to prune list partitions using a set of <> operator clauses)
and related functions. Instead pruning for <> operator clauses is now
implemented by using a combination of PartitionPruneStepOp and
PartitionPruneStepCombine after adding a new combine op COMBINE_INVERT (I
also renamed COMBINE_OR and COMBINE_AND to COMBINE_UNION and
COMBINE_INTERSECT, respectively). I decided to do so because the previous
arrangement looked like a "hack" to support a special case that touched no
less than quite a few places.
Hi Amit,
I've looked at the v44 patch. Thanks for making those changes.
The new not-equal handling code is not quite right.
DROP TABLE listp;
CREATE TABLE listp (a INT) PARTITION BY LIST(a);
CREATE TABLE listp1_3 PARTITION OF listp FOR VALUES IN(1,3);
CREATE TABLE listp_default PARTITION OF listp DEFAULT;
EXPLAIN SELECT * FROM listp WHERE a <> 1;
QUERY PLAN
------------------------------------------------------------------
Append (cost=0.00..54.56 rows=2537 width=4)
-> Seq Scan on listp1_3 (cost=0.00..41.88 rows=2537 width=4)
Filter: (a <> 1)
(3 rows)
The default should be included here.
INSERT INTO listp VALUES(1),(2),(3);
SELECT * FROM listp WHERE a <> 1;
a
---
3
(1 row)
This code assumes its fine to just reverse the setting for default:
result->scan_default = !source->scan_default;
More complex handling is needed here.
I've attached a diff for a small set of other things I noticed while reviewing.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v44_drowley_review.patchapplication/octet-stream; name=v44_drowley_review.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 363b12836b..79cda9cecd 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1673,9 +1673,9 @@ get_matching_partitions(PartitionPruneContext *context,
/*
* Allocate space for individual pruning steps to store its result. Each
* slot will hold a PruneStepResult after performing a given pruning step.
- * Later steps may use the result of one or more earlier steps. Result of
- * of applying all pruning steps is the value contained in the slot of the
- * last pruning step.
+ * Later steps may use the result of one or more earlier steps. The
+ * result of applying all pruning steps is the value contained in the slot
+ * of the last pruning step.
*/
step_results = (PruneStepResult **)
palloc0(num_steps * sizeof(PruneStepResult *));
@@ -1705,10 +1705,9 @@ get_matching_partitions(PartitionPruneContext *context,
}
/*
- * At this point we know that offsets of all the datums whose
- * corresponding partitions need to be in the result, including special
- * null-accepting and default partitions. Collect the actual partition
- * indexes now.
+ * At this point we know the offsets of all the datums whose corresponding
+ * partitions need to be in the result, including special null-accepting
+ * and default partitions. Collect the actual partition indexes now.
*/
i = -1;
result = NULL;
@@ -1819,7 +1818,9 @@ perform_pruning_base_step(PartitionPruneContext *context,
if (cmpfn != context->partsupfunc[keyno].fn_oid)
fmgr_info(cmpfn, &partsupfunc[keyno]);
else
- partsupfunc[keyno] = context->partsupfunc[keyno];
+ fmgr_info_copy(&partsupfunc[keyno],
+ &context->partsupfunc[keyno],
+ CurrentMemoryContext);
values[keyno] = datum;
nvalues++;
@@ -1985,11 +1986,11 @@ perform_pruning_combine_step(PartitionPruneContext *context,
source = step_results[source_step_id];
Assert(source != NULL);
- /* First add possible datum offsets. */
+ /* First add all possible datum offsets. */
result->datum_offsets =
bms_add_range(NULL, 0,
boundinfo->ndatums - 1);
- /* Then remove from it those present in the source. */
+ /* Then remove the members present in source. */
result->datum_offsets =
bms_del_members(result->datum_offsets,
source->datum_offsets);
@@ -2037,7 +2038,7 @@ partkey_datum_from_expr(PartitionPruneContext *context,
* Determine offset of the hash bound matching the specified value,
* considering that all the non-null values come from clauses containing
* a compatible hash eqaulity operator and any keys that are null come
- * from a IS NULL clause
+ * from an IS NULL clause
*
* In most cases, the result would contain just one bound's offset, although
* the set may be empty if the corresponding hash partition has not been
@@ -2149,17 +2150,10 @@ get_matching_list_bounds(PartitionPruneContext *context,
* the former doesn't exist.
*/
if (partition_bound_accepts_nulls(boundinfo))
- {
result->scan_null = true;
- return result;
- }
else if (partition_bound_has_default(boundinfo))
- {
result->scan_default = true;
- return result;
- }
- else
- return result;
+ return result;
}
/*
@@ -2335,7 +2329,7 @@ get_matching_range_bounds(PartitionPruneContext *context,
result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
/*
- * If there are no datums to compare keys with, or if we got a IS NULL
+ * If there are no datums to compare keys with, or if we got an IS NULL
* clause just return the default partition, if it exists.
*/
if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
@@ -2506,8 +2500,13 @@ get_matching_range_bounds(PartitionPruneContext *context,
if (partition_bound_has_default(boundinfo))
{
for (i = minoff; i <= maxoff; i++)
+ {
if (partindices[i] < 0)
+ {
result->scan_default = true;
+ break;
+ }
+ }
}
Assert(minoff >= 0 && maxoff >= 0);
@@ -2517,24 +2516,17 @@ get_matching_range_bounds(PartitionPruneContext *context,
else if (off >= 0) /* !is_equal */
{
/*
- * Look-up value falls in the range between some bounds in
- * boundinfo. off would be the offset of the greatest
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
* bound that is <= look-up value, so add off + 1 to the
* result instead as the offset of the upper bound of the
* only partition that may contain the look-up value.
*/
if (partindices[off + 1] >= 0)
- {
result->datum_offsets = bms_make_singleton(off + 1);
- return result;
- }
else if (partition_bound_has_default(boundinfo))
- {
result->scan_default = true;
- return result;
- }
- else
- return result;
+ return result;
}
/*
* off < 0, meaning the look-up value is smaller that all bounds,
@@ -2683,8 +2675,8 @@ get_matching_range_bounds(PartitionPruneContext *context,
maxoff = inclusive ? off + 1: off;
}
/*
- * Look-up value falls in the range between some bounds in
- * boundinfo. off would be the offset of the greatest
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
* bound that is <= look-up value, so add off + 1 to the
* result instead as the offset of the upper bound of the
* greatest partition that may contain look-up value. If
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index 48fdeacac3..2dd4a3ecf3 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -254,7 +254,7 @@ generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
* If when going through clauses, we find any that are marked as pseudoconstant
* and contains a constant false value, we stop generating any further steps
* and simply return NIL (that is, no pruning steps) after setting *constfalse
- * to true. Caller should consider all partitions as pruned in that case.
+ * to true. The caller should consider all partitions as pruned in that case.
* We may do the same if we find that mutually contradictory clauses are
* present, but were not turned into a pseudoconstant at higher levels.
*
@@ -593,7 +593,7 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
unionStep = (PartitionPruneStep *)
generate_pruning_step_combine(context, step_ids,
COMBINE_UNION);
- /* Then slap on a DIFFERENCE combine step. */
+ /* Now add a step to invert the results. */
diffStep = (PartitionPruneStep *)
generate_pruning_step_combine(context,
list_make1_int(unionStep->step_id),
@@ -603,7 +603,7 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
}
/*
- * generate_opsteps set to false means no OpExprs were directly presemt in
+ * generate_opsteps set to false means no OpExprs were directly present in
* the input list.
*/
if (!generate_opsteps)
@@ -802,8 +802,8 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
}
/*
- * If we've decided that clauses for subsequent partition keys would't
- * be useful for pruning, don't look.
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't look.
*/
if (!consider_next_key)
break;
Hi,
I think there's a bug in generate_pruning_steps_from_opexprs, which does
this for PARTITION_STRATEGY_HASH:
for_each_cell(lc1, lc)
{
pc = lfirst(lc1);
/*
* Note that we pass nullkeys for step_nullkeys,
* because we need to tell hash partition bound search
* function which of the keys are NULL.
*/
Assert(pc->op_strategy == HTEqualStrategyNumber);
pc_steps =
get_steps_using_prefix(context,
HTEqualStrategyNumber,
pc->expr,
pc->cmpfn,
pc->keyno,
nullkeys,
prefix);
}
opsteps = list_concat(opsteps, list_copy(pc_steps));
Notice that the list_concat() is outside the for_each_cell loop. Doesn't
that mean we fail to consider some of the clauses (all except the very
last clause) for pruning? I haven't managed to come up with an example,
but I haven't tried very hard.
FWIW I've noticed this because gcc complains that pg_steps might be used
uninitialized:
partprune.c: In function ‘generate_partition_pruning_steps_internal’:
partprune.c:992:16: warning: ‘pc_steps’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
opsteps = list_concat(opsteps, list_copy(pc_steps));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
partprune.c:936:14: note: ‘pc_steps’ was declared here
List *pc_steps;
^~~~~~~~
All of PostgreSQL successfully made. Ready to install.
So even if it's not a bug, we probably need to fix the code somehow.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks Tomas for the review.
On 2018/03/30 1:55, Tomas Vondra wrote:
Hi,
I think there's a bug in generate_pruning_steps_from_opexprs, which does
this for PARTITION_STRATEGY_HASH:for_each_cell(lc1, lc)
{
pc = lfirst(lc1);/*
* Note that we pass nullkeys for step_nullkeys,
* because we need to tell hash partition bound search
* function which of the keys are NULL.
*/
Assert(pc->op_strategy == HTEqualStrategyNumber);
pc_steps =
get_steps_using_prefix(context,
HTEqualStrategyNumber,
pc->expr,
pc->cmpfn,
pc->keyno,
nullkeys,
prefix);
}opsteps = list_concat(opsteps, list_copy(pc_steps));
Notice that the list_concat() is outside the for_each_cell loop. Doesn't
that mean we fail to consider some of the clauses (all except the very
last clause) for pruning? I haven't managed to come up with an example,
but I haven't tried very hard.FWIW I've noticed this because gcc complains that pg_steps might be used
uninitialized:partprune.c: In function ‘generate_partition_pruning_steps_internal’:
partprune.c:992:16: warning: ‘pc_steps’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
opsteps = list_concat(opsteps, list_copy(pc_steps));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
partprune.c:936:14: note: ‘pc_steps’ was declared here
List *pc_steps;
^~~~~~~~
All of PostgreSQL successfully made. Ready to install.So even if it's not a bug, we probably need to fix the code somehow.
Yeah, the code needs to be fixed. Although, it seems to me that in the
hash partitioning case, the loop would iterate at most once, at least if
the query didn't contain any Params. That's because, at that point, there
cannot be multiple mutually AND'd equality clauses referring to the same
key. For example, if there were in the original query and they contained
different values, we wouldn't get this far anyway as they would be reduced
to constant-false at an earlier planning stage. If they all contained the
same value (e.g. key = 1 and key = 1::smallint and a = 1::int and a =
1::bigint), then only one of them will be left in rel->baserestrictinfo
anyway. But we still need to have the loop because all of what I said
wouldn't happen if the clauses contained Params. In that case, the result
would be determined at execution time.
I have fixed the code as you suggested and will post the fixed version
shortly after fixing the issues David reported.
Thanks,
Amit
Hi David.
On 2018/03/29 20:08, David Rowley wrote:
I've looked at the v44 patch.
Thank you.
Thanks for making those changes.
The new not-equal handling code is not quite right.
DROP TABLE listp;
CREATE TABLE listp (a INT) PARTITION BY LIST(a);
CREATE TABLE listp1_3 PARTITION OF listp FOR VALUES IN(1,3);
CREATE TABLE listp_default PARTITION OF listp DEFAULT;EXPLAIN SELECT * FROM listp WHERE a <> 1;
QUERY PLAN
------------------------------------------------------------------
Append (cost=0.00..54.56 rows=2537 width=4)
-> Seq Scan on listp1_3 (cost=0.00..41.88 rows=2537 width=4)
Filter: (a <> 1)
(3 rows)The default should be included here.
INSERT INTO listp VALUES(1),(2),(3);
SELECT * FROM listp WHERE a <> 1;
a
---
3
(1 row)
Good catch! Indeed, the default partition should not have been pruned
away in this case.
This code assumes its fine to just reverse the setting for default:
result->scan_default = !source->scan_default;
More complex handling is needed here.
Hmm, I thought about this and came to a conclusion that we should *always*
scan the default partition in this case. Inversion step removes all the
datums selected by the source step from the set of *all* datums that the
currently defined set of partitions allow. If there's a default partition
in the mix, that means the latter contains all the datums of the partition
key's data type. Irrespective of whether or not the source step selected
the default partition, there would be datums that would be in the set
after inversion which in turn would be in the default partition, if not in
some non-default partition that would've been selected. I have written a
comment there trying to explain this, but I may not have been able to
articulate it properly. Please check. Or does this sound just wrong?
I've attached a diff for a small set of other things I noticed while
reviewing.
Thanks, merged.
Please find attached the updated patches.
Thanks,
Amit
Attachments:
v45-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v45-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 7f61cd8c1c43b0f6f40d1a63fd2f874806582084 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v45 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0231f8bf7c..30459f7ba9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..f151646271 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v45-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v45-0002-Add-more-tests-for-partition-pruning.patchDownload
From d1ba19704d115b4d6e17bda17b5118a152d0bb26 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v45 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 258 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 88 ++++++++-
2 files changed, 344 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..a0edba291f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,260 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..d2b4561530 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,90 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v45-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v45-0003-Faster-partition-pruning.patchDownload
From f66eac9f2c6229a5bfc03b0cc945249cfcaadbc9 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v45 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1161 +++++++++++++++++
src/backend/nodes/copyfuncs.c | 36 +
src/backend/nodes/nodeFuncs.c | 25 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1717 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 3 +
src/include/nodes/primnodes.h | 74 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 282 +++-
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 7 +
18 files changed, 3412 insertions(+), 73 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..bcd282515a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,18 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * The following struct describes the result of performing one
+ * PartitionPruneStep.
+ */
+typedef struct PruneStepResult
+{
+ Bitmapset *datum_offsets;
+
+ /* Set if we need to scan the default and/or the null partition, resp. */
+ bool scan_default;
+ bool scan_null;
+} PruneStepResult;
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -197,6 +209,23 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static PruneStepResult *get_matching_hash_bound(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1620,9 +1649,1141 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **step_results,
+ *last_step_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. The
+ * result of applying all pruning steps is the value contained in the slot
+ * of the last pruning step.
+ */
+ step_results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ step_results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ step_results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ step_results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know the offsets of all the datums whose corresponding
+ * partitions need to be in the result, including special null-accepting
+ * and default partitions. Collect the actual partition indexes now.
+ */
+ i = -1;
+ result = NULL;
+ last_step_result = step_results[num_steps - 1];
+ while ((i = bms_next_member(last_step_result->datum_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed and if present. */
+ if (last_step_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ if (partition_bound_accepts_nulls(context->boundinfo))
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (last_step_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ if (partition_bound_has_default(context->boundinfo))
+ result = bms_add_member(result,
+ context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /*
+ * There better be the same number of expressions and compare functions.
+ */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * the get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ fmgr_info_copy(&partsupfunc[keyno],
+ &context->partsupfunc[keyno],
+ CurrentMemoryContext);
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_matching_hash_bound(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_LIST:
+ return get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+ case PARTITION_STRATEGY_RANGE:
+ return get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ return NULL;
+ }
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids using the specified
+ * combination method
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+
+ /*
+ * In some cases, the planner generates a combine step that doesn't
+ * contain any argument steps, to signal us to not prune any partitions.
+ * So, return indexes of all datums in that case, including null and/or
+ * default partition, if any.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (list_length(cstep->source_stepids) == 0)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->datum_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+ else
+ {
+ bool firststep;
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result,
+ * which is confirmed by the fact that cstep's step_id is
+ * greater than step_id.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ /*
+ * Add indexes of datums given by the argument step's
+ * result.
+ */
+ result->datum_offsets =
+ bms_add_members(result->datum_offsets,
+ step_result->datum_offsets);
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
+
+ case COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->datum_offsets = step_result->datum_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /*
+ * Only keep indexes of datums that are in argument
+ * step's result.
+ */
+ result->datum_offsets =
+ bms_int_members(result->datum_offsets,
+ step_result->datum_offsets);
+ /*
+ * Update whether to scan null and default partitions.
+ */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default =
+ step_result->scan_default;
+ }
+ }
+ break;
+
+ case COMBINE_INVERT:
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int source_step_id;
+ PruneStepResult *source;
+
+ /*
+ * There should only ever be one source step to invert the
+ * result of.
+ */
+ Assert(list_length(cstep->source_stepids) == 1);
+ source_step_id = linitial_int(cstep->source_stepids);
+ if (source_step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ source = step_results[source_step_id];
+ Assert(source != NULL);
+
+ /* First add all possible datum offsets. */
+ result->datum_offsets =
+ bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ /* Then remove the members present in source. */
+ result->datum_offsets =
+ bms_del_members(result->datum_offsets,
+ source->datum_offsets);
+
+ /*
+ * Revert whether to scan the null partition as the source
+ * steps would've determined it.
+ */
+ Assert(!source->scan_null ||
+ partition_bound_accepts_nulls(boundinfo));
+ result->scan_null = !source->scan_null;
+
+ /*
+ * Unlike other partitions, the set of values contained in
+ * the default partition is unspecified, so it does not
+ * make sense to determine whether or not to scan it by
+ * simply inverting what the source step would've decided.
+ * That's because the boundinfo does not explicitly
+ * contain the datums corresponding to the default
+ * partition. In fact, we should *always* scan the
+ * default partition in this case, because the set of
+ * datums after inversion, other that those that have a
+ * non-default partition defined, would still contain
+ * datums of the partition key's type that could only be
+ * in the default partition.
+ *
+ * XXX - the above reasoing only seems to apply if the
+ * table is list partitioned. Maybe we should Assert that
+ * it is. Currently, we generate a combine step
+ * the inversion op only for a case that's supported for
+ * list partitioning.
+ */
+ result->scan_default = true;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_matching_hash_bound
+ * Determine offset of the hash bound matching the specified value,
+ * considering that all the non-null values come from clauses containing
+ * a compatible hash eqaulity operator and any keys that are null come
+ * from an IS NULL clause
+ *
+ * In most cases, the result would contain just one bound's offset, although
+ * the set may be empty if the corresponding hash partition has not been
+ * created.
+ *
+ * Since there are neither of the special partitions (null and default) in
+ * case of hash partitioning, scan_null and scan_default are not set.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_hash_bound(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we have such clauses for all keys, which the planner must have
+ * found or we wouldn't have gotten here.
+ */
+ Assert(nvalues + bms_num_members(nullkeys) == partnatts);
+
+ /*
+ * If there are any values, they must have come from clauses containing
+ * an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ result->datum_offsets = bms_make_singleton(rowHash % greatest_modulus);
+ result->scan_null = result->scan_default = false;
+
+ return result;
+}
+
+/*
+ * get_matching_list_bounds
+ * Determine the offsets of list bounds matching the specified value,
+ * according to the semantics of the given operator strategy
+ *
+ * If special partitions (null and default) need to be scanned for given
+ * values, set scan_null and scan_default in result if present.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
+
+ * 'value' contains the value to use for pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the list partitioning comparison function to be used
+ * to perform partition_list_bsearch
+
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ result->scan_null = true;
+ else if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions.
+ */
+ if (nvalues == 0)
+ {
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[off] >= 0);
+ result->datum_offsets = bms_make_singleton(off);
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ break;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partitions satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
+ * get_matching_range_datums
+ * Determine the offsets of range bounds matching the specified values,
+ * according to the semantics of the given operator strategy
+ *
+ * Each datum whose offset is in result is to be treated as the upper bound of
+ * the partition that will contain the desired values.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'nvalues', if non-zero, should be <= context->partnatts - 1
+
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the range partitioning comparison functions to be
+ * used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * using.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result;
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(nvalues <= partnatts);
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ /*
+ * If there are no datums to compare keys with, or if we got an IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+
+ /*
+ * If the query does not constrain all key columns, we'll need to scan the
+ * the default partition, if any.
+ */
+ if (nvalues < partnatts && partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ /*
+ * Look for the smallest bound that is = look-up value.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partition. */
+ if (partindices[off + 1] >= 0)
+ {
+ result->datum_offsets = bms_make_singleton(off + 1);
+ return result;
+ }
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ else
+ return result;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ */
+
+ /*
+ * First find greatest bound that's smaller than the
+ * look-up value.
+ */
+ while (off >= 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+ /*
+ * We can treat off as the offset of the smallest bound to
+ * be included in the result, if we know it is the upper
+ * bound of the partition in which the look-up value could
+ * possibly exist. One case it couldn't is if the bound,
+ * or precisely the matched portion of its prefix, is not
+ * inclusive.
+ */
+ if (boundinfo->kind[off][nvalues] ==
+ PARTITION_RANGE_DATUM_MINVALUE)
+ off++;
+
+ minoff = off;
+
+ /*
+ * Now find smallest bound that's greater than the look-up
+ * value.
+ */
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * off + 1, then would be the offset of the greatest bound
+ * to be included in the result.
+ */
+ maxoff = off + 1;
+ }
+
+ /*
+ * Skip if minoff/maxoff are actually the upper bound of a
+ * un-assigned portion of values.
+ */
+ if (partindices[minoff] < 0 && minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+ else if (off >= 0) /* !is_equal */
+ {
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * only partition that may contain the look-up value.
+ */
+ if (partindices[off + 1] >= 0)
+ result->datum_offsets = bms_make_singleton(off + 1);
+ else if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+ /*
+ * off < 0, meaning the look-up value is smaller that all bounds,
+ * so only the default partition, if any, qualifies.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ return result;
+ }
+ /* none qualifies. */
+ else
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ /*
+ * Look for the smallest bound that is > or >= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the look-up value, so include
+ * all of them in the result.
+ */
+ minoff = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ *
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes of all such
+ * bounds in the result (that is, set minoff to the index
+ * of smallest such bound) or find the smallest one that's
+ * greater than the look-up value and set minoff to that.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ minoff = inclusive ? off : off + 1;
+ }
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * smallest partition that may contain the look-up value.
+ */
+ else
+ minoff = off + 1;
+ }
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ /*
+ * Look for the greatest bound that is < or <= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the key, so we could only
+ * expect to find the look-up key in the default partition.
+ */
+ if (partition_bound_has_default(boundinfo))
+ result->scan_default = true;
+ return result;
+ }
+ else
+ {
+ /*
+ * See the comment above.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ maxoff = inclusive ? off + 1: off;
+ }
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * greatest partition that may contain look-up value. If
+ * the look-up value had exactly matched the bound, but it
+ * isn't inclusive, no need add the adjacent partition.
+ */
+ else if (!is_equal || inclusive)
+ maxoff = off + 1;
+ else
+ maxoff = off;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition, as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ }
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ result->scan_default = true;
+ }
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ if (partindices[i] < 0)
+ result->scan_default = true;
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (minoff > maxoff)
+ return result;
+ result->datum_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7293a60d7..d0ab4273c8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2133,6 +2133,36 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,6 +5054,12 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..52de893e89 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,17 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2943,20 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..3fd3cadb01 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -867,6 +868,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -874,6 +877,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1121,6 +1138,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..e5e6d7530b
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1717 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to partition pruning "steps"
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning", as all the needed
+ * information is statically available in the query being planned. Otherwise,
+ * they'd need to be delivered to the executor where the missing information
+ * can be filled and pruning tried one more time, which would be called
+ * "dynamic pruning".
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static Node *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static Node *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the global array context->steps and
+ * each one gets an identifier which is unique across all recursive
+ * invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. The caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *ne_clauses = NIL;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * indepdently, collect their step IDs to be stored in the combine
+ * step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps either means that arg_constfalse is true
+ * or the arg didn't contain a clause matching this
+ * partition key.
+ *
+ * In case of the latter, we cannot prune using such
+ * an arg. To indicate that to the pruning code, we
+ * must construct a dummy PartitionPruneStepCombine
+ * whose source_stepids is set to to an empty List.
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ /* Just ignore this argument. */
+ if (arg_constfalse)
+ continue;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_UNION);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ Assert(pc != NULL);
+ /*
+ * If the clause was one containing an operator named <>,
+ * we generate a special pruning steps designed to handle
+ * those, so collect it in a separate list.
+ */
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ {
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * Combine expressions from all <> operator clauses into one prune step.
+ * What we do is we convert what would originally be:
+ *
+ * ne_clause1 AND ne_clause2 .. AND ne_clauseN
+ *
+ * into:
+ *
+ * NOT (eq_clause1 OR eq_clause2 .. OR eq_clauseN)
+ *
+ * where each of the eq_clauses are constructed with valid negator of the
+ * <> operator appearing in corresponding ne_clauses.
+ */
+ if (ne_clauses != NIL)
+ {
+ List *step_ids = NIL;
+ PartitionPruneStep *unionStep,
+ *diffStep,
+ *nullpartStep;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStep *step;
+
+ /*
+ * Generate a opstep using what must be a btree = operator, that
+ * is, the negator of <> originally appearing in the clause.
+ */
+ step = (PartitionPruneStep *)
+ generate_pruning_step_op(context,
+ BTEqualStrategyNumber,
+ list_make1(pc->expr),
+ list_make1_oid(pc->cmpfn),
+ NULL);
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ /*
+ * Moreover, we must add an explicit step as an argument of the union
+ * step being built to select the NULL-only partition (if any), so
+ * that it is excluded from the final result by subsequent inversion.
+ * That's because all these <> clauses are strict and hence won't
+ * select any records of the NULL-only partition.
+ */
+ Assert(part_scheme->partnatts == 1);
+ nullpartStep = (PartitionPruneStep *)
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ bms_make_singleton(0));
+ step_ids = lappend_int(step_ids, nullpartStep->step_id);
+
+ /* Combine all opsteps above using a UNION combine step first. */
+ unionStep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_UNION);
+ /* Now add a step to invert the results. */
+ diffStep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context,
+ list_make1_int(unionStep->step_id),
+ COMBINE_INVERT);
+
+ result = lappend(result, diffStep);
+ }
+
+ /*
+ * generate_opsteps set to false means no OpExprs were directly present in
+ * the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ nullkeys));
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ NULL));
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an AND combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_INTERSECT));
+ }
+
+ return result;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, BTORDER_PROC);
+ break;
+
+ case PARTITION_STRATEGY_HASH:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ exprtype, exprtype, HASHEXTENDED_PROC);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys, which
+ * get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ return (PartitionPruneStep *)
+ generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_INTERSECT);
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ return list_make1(generate_pruning_step_op(context, step_opstrategy,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys));
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ result = lappend(result,
+ generate_pruning_step_op(context,
+ step_opstrategy,
+ step_exprs1,
+ step_cmpfns1,
+ step_nullkeys));
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Following functions generate pruning steps of various types. Each step
+ * that's created is added to a global context->steps and receive a globally
+ * unique identifier that's sourced from context->next_step_id.
+ */
+
+static Node *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+ opstep->opstrategy = opstrategy;
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (Node *) opstep;
+}
+
+static Node *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (Node *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 30459f7ba9..155be722f6 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..8981901272 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -74,4 +96,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 443de22704..adb0d3a45f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -192,6 +192,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..a71d729e72 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,78 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_UNION,
+ COMBINE_INTERSECT,
+ COMBINE_INVERT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f151646271..ed0a885370 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index a0edba291f..0be31cce7e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -1007,24 +1009,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1032,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1111,13 +1098,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(28 rows)
-- pruning should work fine, because values for a prefix of keys (a, b) are
-- available
@@ -1275,22 +1270,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning, with values provided for both keys
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1343,3 +1332,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d2b4561530..8377671cde 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -239,3 +239,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 17bf55c1f5..5006babc6c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1584,6 +1585,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1596,6 +1598,10 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
@@ -1749,6 +1755,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
--
2.11.0
v45-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v45-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 5658860b6177d582d4364b287f91c833ede6aaa8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v45 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 99 ++++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 111 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d0ab4273c8..04a7e1aa62 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2291,21 +2291,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5079,9 +5064,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 765b1be74b..164eff7363 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3187,9 +3177,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f61ae03ac5..9ce40ee3b3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2230,7 +2230,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2255,6 +2254,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2304,6 +2304,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2529,16 +2530,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4074,9 +4065,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3fd3cadb01..03b94f6593 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -877,6 +877,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1330,6 +1341,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1340,7 +1357,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1367,49 +1383,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1428,9 +1450,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a19f5d0c02..0fedb84ac9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -615,7 +615,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -630,6 +629,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1172,12 +1172,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1249,10 +1249,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1484,6 +1486,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1583,6 +1594,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1590,7 +1616,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6117,65 +6143,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9d94..058fb24927 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index adb0d3a45f..e6b5770c74 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,7 +264,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ed0a885370..b4219b2d57 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2127,27 +2132,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5006babc6c..d9dd2209dc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1608,7 +1608,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 30 March 2018 at 18:38, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Please find attached the updated patches.
Thanks.
I've noticed that there are no outfuncs or readfuncs for all the new
Step types you've added.
Also, the copy func does not properly copy the step_id in the base
node type. This will remain at 0 after a copyObject()
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 31 March 2018 at 01:18, David Rowley <david.rowley@2ndquadrant.com> wrote:
I've noticed that there are no outfuncs or readfuncs for all the new
Step types you've added.Also, the copy func does not properly copy the step_id in the base
node type. This will remain at 0 after a copyObject()
I've attached a quickly put together fix for this. I'm not quite sure
what the done thing is to copy/read/write nodes which inherit fields
from other nodes, so what I've done in the attached might not be
correct. However, I see _copyValue() does what I've done, so perhaps
its fine, or that may just be more of a special case.
I also manually removed some hunks from the diff, so hopefully, it
still works correctly.
Attaching it as it may save you some time from doing it yourself.
Please check it though.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
copy_read_write_func_fixes.patchapplication/octet-stream; name=copy_read_write_func_fixes.patchDownload
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 54d38d99e0..b751081280 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -245,6 +245,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(partitioned_rels);
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(first_partial_plan);
+ COPY_NODE_FIELD(part_prune_infos);
return newnode;
}
@@ -2156,7 +2157,7 @@ static PartitionPruneStepCombine *
_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
{
PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
-
+ COPY_SCALAR_FIELD(step.step_id);
COPY_SCALAR_FIELD(combineOp);
COPY_NODE_FIELD(source_stepids);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ad502bc238..d288544fa9 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1708,6 +1709,28 @@ _outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
WRITE_NODE_FIELD(exclRelTlist);
}
+static void
+_outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPOP");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_INT_FIELD(opstrategy);
+ WRITE_NODE_FIELD(exprs);
+ WRITE_NODE_FIELD(cmpfns);
+ WRITE_BITMAPSET_FIELD(nullkeys);
+}
+
+static void
+_outPartitionPruneStepCombine(StringInfo str, const PartitionPruneStepCombine *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPCOMBINE");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ WRITE_NODE_FIELD(source_stepids);
+}
+
static void
_outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
{
@@ -3948,6 +3971,12 @@ outNode(StringInfo str, const void *obj)
case T_OnConflictExpr:
_outOnConflictExpr(str, obj);
break;
+ case T_PartitionPruneStepOp:
+ _outPartitionPruneStepOp(str, obj);
+ break;
+ case T_PartitionPruneStepCombine:
+ _outPartitionPruneStepCombine(str, obj);
+ break;
case T_PartitionPruneInfo:
_outPartitionPruneInfo(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index f9bf335ad0..760bb40162 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1328,6 +1328,33 @@ _readOnConflictExpr(void)
READ_DONE();
}
+static PartitionPruneStepOp *
+_readPartitionPruneStepOp(void)
+{
+ READ_LOCALS(PartitionPruneStepOp);
+
+ READ_INT_FIELD(step.step_id);
+ READ_INT_FIELD(opstrategy);
+ READ_NODE_FIELD(exprs);
+ READ_NODE_FIELD(cmpfns);
+ READ_BITMAPSET_FIELD(nullkeys);
+
+ READ_DONE();
+}
+
+static PartitionPruneStepCombine *
+_readPartitionPruneStepCombine(void)
+{
+ READ_LOCALS(PartitionPruneStepCombine);
+
+ READ_INT_FIELD(step.step_id);
+ READ_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ READ_NODE_FIELD(source_stepids);
+
+ READ_DONE();
+}
+
+
static PartitionPruneInfo *
_readPartitionPruneInfo(void)
{
@@ -2589,6 +2617,10 @@ parseNodeString(void)
return_value = _readFromExpr();
else if (MATCH("ONCONFLICTEXPR", 14))
return_value = _readOnConflictExpr();
+ else if (MATCH("PARTITIONPRUNESTEPOP", 20))
+ return_value = _readPartitionPruneStepOp();
+ else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
+ return_value = _readPartitionPruneStepCombine();
else if (MATCH("PARTITIONPRUNEINFO", 18))
return_value = _readPartitionPruneInfo();
else if (MATCH("RTE", 3))
David Rowley wrote:
Also, the copy func does not properly copy the step_id in the base
node type. This will remain at 0 after a copyObject()
As I recall, you can find these mistakes by compiling with
-DCOPY_PARSE_PLAN_TREES.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 31 March 2018 at 02:00, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 31 March 2018 at 01:18, David Rowley <david.rowley@2ndquadrant.com> wrote:
I've noticed that there are no outfuncs or readfuncs for all the new
Step types you've added.Also, the copy func does not properly copy the step_id in the base
node type. This will remain at 0 after a copyObject()Attaching it as it may save you some time from doing it yourself.
Please check it though.
The attached might be slightly easier to apply. The previous version
was based on top of some other changes I'd been making.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
copy_read_write_func_fixes_v2.patchapplication/octet-stream; name=copy_read_write_func_fixes_v2.patchDownload
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 04a7e1aa62..f5634274a9 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2156,7 +2156,7 @@ static PartitionPruneStepCombine *
_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
{
PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
-
+ COPY_SCALAR_FIELD(step.step_id);
COPY_SCALAR_FIELD(combineOp);
COPY_NODE_FIELD(source_stepids);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9ce40ee3b3..3f9e2585c7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1693,6 +1693,28 @@ _outFromExpr(StringInfo str, const FromExpr *node)
WRITE_NODE_FIELD(quals);
}
+static void
+_outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPOP");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_INT_FIELD(opstrategy);
+ WRITE_NODE_FIELD(exprs);
+ WRITE_NODE_FIELD(cmpfns);
+ WRITE_BITMAPSET_FIELD(nullkeys);
+}
+
+static void
+_outPartitionPruneStepCombine(StringInfo str, const PartitionPruneStepCombine *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPCOMBINE");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ WRITE_NODE_FIELD(source_stepids);
+}
+
static void
_outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
{
@@ -3924,6 +3946,12 @@ outNode(StringInfo str, const void *obj)
case T_OnConflictExpr:
_outOnConflictExpr(str, obj);
break;
+ case T_PartitionPruneStepOp:
+ _outPartitionPruneStepOp(str, obj);
+ break;
+ case T_PartitionPruneStepCombine:
+ _outPartitionPruneStepCombine(str, obj);
+ break;
case T_Path:
_outPath(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d02d4ec5b7..8348933151 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1328,6 +1328,32 @@ _readOnConflictExpr(void)
READ_DONE();
}
+static PartitionPruneStepOp *
+_readPartitionPruneStepOp(void)
+{
+ READ_LOCALS(PartitionPruneStepOp);
+
+ READ_INT_FIELD(step.step_id);
+ READ_INT_FIELD(opstrategy);
+ READ_NODE_FIELD(exprs);
+ READ_NODE_FIELD(cmpfns);
+ READ_BITMAPSET_FIELD(nullkeys);
+
+ READ_DONE();
+}
+
+static PartitionPruneStepCombine *
+_readPartitionPruneStepCombine(void)
+{
+ READ_LOCALS(PartitionPruneStepCombine);
+
+ READ_INT_FIELD(step.step_id);
+ READ_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ READ_NODE_FIELD(source_stepids);
+
+ READ_DONE();
+}
+
/*
* Stuff from parsenodes.h.
*/
@@ -2572,6 +2598,10 @@ parseNodeString(void)
return_value = _readFromExpr();
else if (MATCH("ONCONFLICTEXPR", 14))
return_value = _readOnConflictExpr();
+ else if (MATCH("PARTITIONPRUNESTEPOP", 20))
+ return_value = _readPartitionPruneStepOp();
+ else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
+ return_value = _readPartitionPruneStepCombine();
else if (MATCH("RTE", 3))
return_value = _readRangeTblEntry();
else if (MATCH("RANGETBLFUNCTION", 16))
On 30 March 2018 at 18:38, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Please find attached the updated patches.
There's a bit of a strange case with v45 around prepared statements.
I've not debugged this yet, but in case you get there first, here's
the case:
create table listp (a int, b int) partition by list (a);
create table listp_1 partition of listp for values in(1) partition by list (b);
create table listp_1_1 partition of listp_1 for values in(1);
create table listp_2 partition of listp for values in(2) partition by list (b);
create table listp_2_1 partition of listp_2 for values in(2);
explain select * from listp where b in(1,2) and 2<>b and 0<>b; -- this
one looks fine.
QUERY PLAN
----------------------------------------------------------------------------
Append (cost=0.00..49.66 rows=22 width=8)
-> Seq Scan on listp_1_1 (cost=0.00..49.55 rows=22 width=8)
Filter: ((b = ANY ('{1,2}'::integer[])) AND (2 <> b) AND (0 <> b))
(3 rows)
prepare q1 (int,int,int,int) as select * from listp where b in($1,$2)
and $3 <> b and $4 <> b;
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,0);
QUERY PLAN
--------------------------------
Result (actual rows=0 loops=1)
One-Time Filter: false
(2 rows)
My best guess is that something ate the bits out of a Bitmapset of the
matching partitions somewhere.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/03/30 22:41, David Rowley wrote:
On 31 March 2018 at 02:00, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 31 March 2018 at 01:18, David Rowley <david.rowley@2ndquadrant.com> wrote:
I've noticed that there are no outfuncs or readfuncs for all the new
Step types you've added.Also, the copy func does not properly copy the step_id in the base
node type. This will remain at 0 after a copyObject()Attaching it as it may save you some time from doing it yourself.
Please check it though.The attached might be slightly easier to apply. The previous version
was based on top of some other changes I'd been making.
Thanks David. I have merged this.
Regards,
Amit
Hi David.
On 2018/03/31 0:55, David Rowley wrote:
On 30 March 2018 at 18:38, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Please find attached the updated patches.
There's a bit of a strange case with v45 around prepared statements.
I've not debugged this yet, but in case you get there first, here's
the case:create table listp (a int, b int) partition by list (a);
create table listp_1 partition of listp for values in(1) partition by list (b);
create table listp_1_1 partition of listp_1 for values in(1);
create table listp_2 partition of listp for values in(2) partition by list (b);
create table listp_2_1 partition of listp_2 for values in(2);explain select * from listp where b in(1,2) and 2<>b and 0<>b; -- this
one looks fine.
QUERY PLAN
----------------------------------------------------------------------------
Append (cost=0.00..49.66 rows=22 width=8)
-> Seq Scan on listp_1_1 (cost=0.00..49.55 rows=22 width=8)
Filter: ((b = ANY ('{1,2}'::integer[])) AND (2 <> b) AND (0 <> b))
(3 rows)prepare q1 (int,int,int,int) as select * from listp where b in($1,$2)
and $3 <> b and $4 <> b;
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
execute q1 (1,2,3,4);
explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,0);
QUERY PLAN
--------------------------------
Result (actual rows=0 loops=1)
One-Time Filter: false
(2 rows)My best guess is that something ate the bits out of a Bitmapset of the
matching partitions somewhere.
Hmm. It is the newly added inversion step that's causing this. When
creating a generic plan (that is when the planning happens via
BuildCachedPlan called with boundParams set to NULL), the presence of
Params will cause an inversion step's source step to produce
scan-all-partitions sort of result, which the inversion step dutifully
inverts to a scan-no-partitions result.
I have tried to attack that problem by handling the
no-values-to-prune-with case using a side-channel to propagate the
scan-all-partitions result through possibly multiple steps. That is, a
base pruning step will set datum_offsets in a PruneStepResult only if
pruning is carried out by actually comparing values with the partition
bounds. If no values were provided (like in the generic plan case), it
will set a scan_all_nonnull flag instead and return without setting
datum_offsets. Combine steps perform their combining duty only if
datum_offset contains a valid value, that is, if scan_all_nonnulls is not set.
Attached updated version of the patches.
Thanks,
Amit
Attachments:
v46-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v46-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 61b75e85fda5172dcd1ebafd831efeea2a49b587 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v46 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 0231f8bf7c..30459f7ba9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1877,7 +1877,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1895,7 +1896,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1913,6 +1914,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1947,6 +1961,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index ea5251c6be..6158df68dd 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v46-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v46-0002-Add-more-tests-for-partition-pruning.patchDownload
From 3729a2214c24adbd77a73669f85266e04e12294f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v46 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 258 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 88 ++++++++-
2 files changed, 344 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..a0edba291f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,260 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..d2b4561530 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,90 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v46-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v46-0003-Faster-partition-pruning.patchDownload
From 910b9ec06bc43475e1246d6be37e19c4dd808a6e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v46 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1222 ++++++++++++++++++
src/backend/nodes/copyfuncs.c | 36 +
src/backend/nodes/nodeFuncs.c | 25 +
src/backend/nodes/outfuncs.c | 28 +
src/backend/nodes/readfuncs.c | 30 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1717 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 3 +
src/include/nodes/primnodes.h | 74 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 282 +++-
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 7 +
20 files changed, 3531 insertions(+), 73 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..2a088b416b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,32 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * The following struct describes the result of performing one
+ * PartitionPruneStep.
+ */
+typedef struct PruneStepResult
+{
+ /*
+ * This contains the offsets of bounds (of those in a table's boundinfo,
+ * each of which is a bound whose corresponding partition is selected by a
+ * given pruning step. For a base pruning step, this value is only valid
+ * if it was obtained after comparing the values provided by the step with
+ * partition bounds.
+ */
+ Bitmapset *bound_offsets;
+
+ /*
+ * Set if we need to scan all partitions that contain non-null data; if
+ * this is set, bound_offsets should be NULL and its value should not be
+ * relied upon.
+ */
+ bool scan_all_nonnull;
+
+ /* Set if we need to scan the default and/or the null partition, resp. */
+ bool scan_default;
+ bool scan_null;
+} PruneStepResult;
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -197,6 +223,27 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static Bitmapset *get_matching_hash_bound(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_all_nonnull);
+static Bitmapset *get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_null, bool *scan_default,
+ bool *scan_all_nonnull);
+static Bitmapset *get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_default, bool *scan_all_nonnull);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1620,9 +1667,1184 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **step_results,
+ *last_step_result;
+ Bitmapset *bound_offsets;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. The
+ * result of applying all pruning steps is the value contained in the slot
+ * of the last pruning step.
+ */
+ step_results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ step_results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ Assert(!step_results[step->step_id]->scan_all_nonnull ||
+ step_results[step->step_id]->bound_offsets == NULL);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ step_results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ step_results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know the offsets of all the datums whose corresponding
+ * partitions need to be in the result, including special null-accepting
+ * and default partitions. Collect the actual partition indexes now.
+ */
+ last_step_result = step_results[num_steps - 1];
+ Assert(last_step_result != NULL);
+ if (last_step_result->scan_all_nonnull)
+ bound_offsets = bms_add_range(NULL, 0,
+ context->boundinfo->ndatums - 1);
+ else
+ bound_offsets = last_step_result->bound_offsets;
+ i = -1;
+ result = NULL;
+ while ((i = bms_next_member(bound_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed and if present. */
+ if (last_step_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ if (partition_bound_accepts_nulls(context->boundinfo))
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (last_step_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ if (partition_bound_has_default(context->boundinfo))
+ result = bms_add_member(result,
+ context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ PruneStepResult *result;
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /*
+ * There better be the same number of expressions and compare functions.
+ */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * the get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ fmgr_info_copy(&partsupfunc[keyno],
+ &context->partsupfunc[keyno],
+ CurrentMemoryContext);
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result->bound_offsets = get_matching_hash_bound(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys,
+ &result->scan_all_nonnull);
+ /*
+ * Since there are neither of the special partitions (null and
+ * default) in case of hash partitioning, scan_null and
+ * scan_default are not set.
+ */
+ result->scan_null = result->scan_default = false;
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result->bound_offsets = get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys,
+ &result->scan_null,
+ &result->scan_default,
+ &result->scan_all_nonnull);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result->bound_offsets = get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys,
+ &result->scan_default,
+ &result->scan_all_nonnull);
+ /* There is no special null-accepting range partition. */
+ result->scan_null = false;
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids using the specified
+ * combination method
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+
+ /*
+ * In some cases, the planner generates a combine step that doesn't
+ * contain any argument steps, to signal us to not prune any partitions.
+ * So, return indexes of all datums in that case, including null and/or
+ * default partition, if any.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (list_length(cstep->source_stepids) == 0)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->bound_offsets = NULL;
+ result->scan_all_nonnull = true;
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+ else
+ {
+ bool firststep;
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result,
+ * which is confirmed by the fact that cstep's step_id is
+ * greater than step_id and the fact that results of the
+ * individual steps are evaluated in sequence of their
+ * step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ result->scan_all_nonnull = step_result->scan_all_nonnull;
+
+ /*
+ * Set bound_offsets if required. A bound's offset will
+ * be added to the results if it is present in either the
+ * source or the target.
+ */
+ if (!result->scan_all_nonnull)
+ result->bound_offsets =
+ bms_add_members(result->bound_offsets,
+ step_result->bound_offsets);
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
+
+ case COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->bound_offsets = step_result->bound_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ result->scan_all_nonnull =
+ step_result->scan_all_nonnull;
+ firststep = false;
+ }
+ else
+ {
+ /*
+ * Set bound_offsets if required. A bound's offset
+ * will be added to the results only if it is present
+ * in both the source and the target.
+ */
+ if (!step_result->scan_all_nonnull)
+ result->bound_offsets =
+ bms_int_members(result->bound_offsets,
+ step_result->bound_offsets);
+ /*
+ * Update whether to scan null and default partitions.
+ */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default =
+ step_result->scan_default;
+ }
+ }
+ break;
+
+ case COMBINE_INVERT:
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int source_step_id;
+ PruneStepResult *source;
+
+ /*
+ * There should only ever be one source step to invert the
+ * result of.
+ */
+ Assert(list_length(cstep->source_stepids) == 1);
+ source_step_id = linitial_int(cstep->source_stepids);
+ if (source_step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ source = step_results[source_step_id];
+ Assert(source != NULL);
+
+ result->scan_all_nonnull = source->scan_all_nonnull;
+
+ /*
+ * Set bound_offsets if required. A bound's offset
+ * will be added to the results only if it is present
+ * in target but not the source.
+ */
+ if (!result->scan_all_nonnull)
+ {
+ /* First add all possible datum offsets. */
+ result->bound_offsets =
+ bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ /* Remove from it the members present in source. */
+ result->bound_offsets =
+ bms_del_members(result->bound_offsets,
+ source->bound_offsets);
+ }
+
+ /*
+ * Revert whether to scan the null partition as the source
+ * steps would've determined it.
+ */
+ Assert(!source->scan_null ||
+ partition_bound_accepts_nulls(boundinfo));
+ result->scan_null = !source->scan_null;
+
+ /*
+ * Unlike other partitions, the set of values contained in
+ * the default partition is unspecified, so it does not
+ * make sense to determine whether or not to scan it by
+ * simply inverting what the source step would've decided.
+ * That's because the boundinfo does not explicitly
+ * contain the datums corresponding to the default
+ * partition. In fact, we should *always* scan the
+ * default partition in this case, because the set of
+ * datums after inversion, other that those that have a
+ * non-default partition defined, would still contain
+ * datums of the partition key's type that could only be
+ * in the default partition.
+ *
+ * XXX - the above reasoing only seems to apply if the
+ * table is list partitioned. Maybe we should Assert that
+ * it is. Currently, we generate a combine step with
+ * the inversion op only for a case that's supported for
+ * list partitioning.
+ */
+ result->scan_default = true;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_matching_hash_bound
+ * Determine offset of the hash bound matching the specified value,
+ * considering that all the non-null values come from clauses containing
+ * a compatible hash eqaulity operator and any keys that are null come
+ * from an IS NULL clause
+ *
+ * In most cases, the result would contain just one bound's offset, although
+ * the set may be empty if the corresponding hash partition has not been
+ * created.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+
+ * 'nullkeys' is the set of partition keys that are null.
+ *
+ * '*scan_all_nonnull' is set if all partitions containing non-null datums
+ * should be scanned
+ */
+static Bitmapset *
+get_matching_hash_bound(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_all_nonnull)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we got values for all keys.
+ */
+ if (nvalues + bms_num_members(nullkeys) == partnatts)
+ {
+ *scan_all_nonnull = false;
+ /*
+ * If there are any values, they must have come from clauses
+ * containing an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ return bms_make_singleton(rowHash % greatest_modulus);
+ }
+ else
+ *scan_all_nonnull = true;
+
+ return NULL;
+}
+
+/*
+ * get_matching_list_bounds
+ * Determine the offsets of list bounds matching the specified value,
+ * according to the semantics of the given operator strategy
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
+
+ * 'value' contains the value to use for pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the list partitioning comparison function to be used
+ * to perform partition_list_bsearch
+
+ * 'nullkeys' is the set of partition keys that are null.
+ *
+ * '*scan_null' is set if the special null-accepting partition should be
+ * scanned
+ *
+ * '*scan_default' is set if the special default partition should be scanned
+ *
+ * '*scan_all_nonnull' is set if all partitions containing non-null datums
+ * should be scanned
+ */
+static Bitmapset *
+get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_null, bool *scan_default,
+ bool *scan_all_nonnull)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ *scan_null = *scan_default = *scan_all_nonnull = false;
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ *scan_null = true;
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ *scan_default = true;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions.
+ */
+ if (nvalues == 0)
+ {
+ *scan_all_nonnull = true;
+ return NULL;
+ }
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[off] >= 0);
+ return bms_make_singleton(off);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partitions satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return NULL;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return NULL;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ return bms_add_range(NULL, minoff, maxoff);
+}
+
+/*
+ * get_matching_range_datums
+ * Determine the offsets of range bounds matching the specified values,
+ * according to the semantics of the given operator strategy
+ *
+ * Each datum whose offset is in result is to be treated as the upper bound of
+ * the partition that will contain the desired values.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'nvalues', if non-zero, should be <= context->partnatts - 1
+
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the range partitioning comparison functions to be
+ * used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * using.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_default, bool *scan_all_nonnull)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(nvalues <= partnatts);
+
+ *scan_default = *scan_all_nonnull = false;
+
+ /*
+ * If there are no datums to compare keys with, or if we got an IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+
+ return bms_add_range(NULL, minoff, maxoff);
+ }
+
+ /*
+ * If the query does not constrain all key columns, we'll need to scan the
+ * the default partition, if any.
+ */
+ if (nvalues < partnatts && partition_bound_has_default(boundinfo))
+ *scan_default = true;
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ /*
+ * Look for the smallest bound that is = look-up value.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partition. */
+ if (partindices[off + 1] >= 0)
+ return bms_make_singleton(off + 1);
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ */
+
+ /*
+ * First find greatest bound that's smaller than the
+ * look-up value.
+ */
+ while (off >= 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+ /*
+ * We can treat off as the offset of the smallest bound to
+ * be included in the result, if we know it is the upper
+ * bound of the partition in which the look-up value could
+ * possibly exist. One case it couldn't is if the bound,
+ * or precisely the matched portion of its prefix, is not
+ * inclusive.
+ */
+ if (boundinfo->kind[off][nvalues] ==
+ PARTITION_RANGE_DATUM_MINVALUE)
+ off++;
+
+ minoff = off;
+
+ /*
+ * Now find smallest bound that's greater than the look-up
+ * value.
+ */
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * off + 1, then would be the offset of the greatest bound
+ * to be included in the result.
+ */
+ maxoff = off + 1;
+ }
+
+ /*
+ * Skip if minoff/maxoff are actually the upper bound of a
+ * un-assigned portion of values.
+ */
+ if (partindices[minoff] < 0 && minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ *scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ return bms_add_range(NULL, minoff, maxoff);
+ }
+ else if (off >= 0) /* !is_equal */
+ {
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * only partition that may contain the look-up value.
+ */
+ if (partindices[off + 1] >= 0)
+ return bms_make_singleton(off + 1);
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+ /*
+ * off < 0, meaning the look-up value is smaller that all bounds,
+ * so only the default partition, if any, qualifies.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ /*
+ * Look for the smallest bound that is > or >= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the look-up value, so include
+ * all of them in the result.
+ */
+ minoff = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ *
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes of all such
+ * bounds in the result (that is, set minoff to the index
+ * of smallest such bound) or find the smallest one that's
+ * greater than the look-up value and set minoff to that.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ minoff = inclusive ? off : off + 1;
+ }
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * smallest partition that may contain the look-up value.
+ */
+ else
+ minoff = off + 1;
+ }
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ /*
+ * Look for the greatest bound that is < or <= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the key, so we could only
+ * expect to find the look-up key in the default partition.
+ */
+ if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+ else
+ {
+ /*
+ * See the comment above.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ maxoff = inclusive ? off + 1: off;
+ }
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * greatest partition that may contain look-up value. If
+ * the look-up value had exactly matched the bound, but it
+ * isn't inclusive, no need add the adjacent partition.
+ */
+ else if (!is_equal || inclusive)
+ maxoff = off + 1;
+ else
+ maxoff = off;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition, as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ *scan_default = true;
+ }
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ *scan_default = true;
+ }
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ *scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (minoff > maxoff)
+ return NULL;
+ return bms_add_range(NULL, minoff, maxoff);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c7293a60d7..b019a50a84 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2133,6 +2133,36 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5024,6 +5054,12 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6c76c41ebe..52de893e89 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2146,6 +2146,17 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2932,6 +2943,20 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index f61ae03ac5..2333c8df96 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1694,6 +1694,28 @@ _outFromExpr(StringInfo str, const FromExpr *node)
}
static void
+_outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPOP");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_INT_FIELD(opstrategy);
+ WRITE_NODE_FIELD(exprs);
+ WRITE_NODE_FIELD(cmpfns);
+ WRITE_BITMAPSET_FIELD(nullkeys);
+}
+
+static void
+_outPartitionPruneStepCombine(StringInfo str, const PartitionPruneStepCombine *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPCOMBINE");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ WRITE_NODE_FIELD(source_stepids);
+}
+
+static void
_outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
{
WRITE_NODE_TYPE("ONCONFLICTEXPR");
@@ -3933,6 +3955,12 @@ outNode(StringInfo str, const void *obj)
case T_OnConflictExpr:
_outOnConflictExpr(str, obj);
break;
+ case T_PartitionPruneStepOp:
+ _outPartitionPruneStepOp(str, obj);
+ break;
+ case T_PartitionPruneStepCombine:
+ _outPartitionPruneStepCombine(str, obj);
+ break;
case T_Path:
_outPath(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d02d4ec5b7..8348933151 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1328,6 +1328,32 @@ _readOnConflictExpr(void)
READ_DONE();
}
+static PartitionPruneStepOp *
+_readPartitionPruneStepOp(void)
+{
+ READ_LOCALS(PartitionPruneStepOp);
+
+ READ_INT_FIELD(step.step_id);
+ READ_INT_FIELD(opstrategy);
+ READ_NODE_FIELD(exprs);
+ READ_NODE_FIELD(cmpfns);
+ READ_BITMAPSET_FIELD(nullkeys);
+
+ READ_DONE();
+}
+
+static PartitionPruneStepCombine *
+_readPartitionPruneStepCombine(void)
+{
+ READ_LOCALS(PartitionPruneStepCombine);
+
+ READ_INT_FIELD(step.step_id);
+ READ_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ READ_NODE_FIELD(source_stepids);
+
+ READ_DONE();
+}
+
/*
* Stuff from parsenodes.h.
*/
@@ -2572,6 +2598,10 @@ parseNodeString(void)
return_value = _readFromExpr();
else if (MATCH("ONCONFLICTEXPR", 14))
return_value = _readOnConflictExpr();
+ else if (MATCH("PARTITIONPRUNESTEPOP", 20))
+ return_value = _readPartitionPruneStepOp();
+ else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
+ return_value = _readPartitionPruneStepCombine();
else if (MATCH("RTE", 3))
return_value = _readRangeTblEntry();
else if (MATCH("RANGETBLFUNCTION", 16))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c4e4db15a6..fd89c7cfee 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -874,6 +875,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -881,6 +884,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1128,6 +1145,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..e5e6d7530b
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1717 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to partition pruning "steps"
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning", as all the needed
+ * information is statically available in the query being planned. Otherwise,
+ * they'd need to be delivered to the executor where the missing information
+ * can be filled and pruning tried one more time, which would be called
+ * "dynamic pruning".
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static Node *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static Node *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the global array context->steps and
+ * each one gets an identifier which is unique across all recursive
+ * invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. The caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *ne_clauses = NIL;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * indepdently, collect their step IDs to be stored in the combine
+ * step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps either means that arg_constfalse is true
+ * or the arg didn't contain a clause matching this
+ * partition key.
+ *
+ * In case of the latter, we cannot prune using such
+ * an arg. To indicate that to the pruning code, we
+ * must construct a dummy PartitionPruneStepCombine
+ * whose source_stepids is set to to an empty List.
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ /* Just ignore this argument. */
+ if (arg_constfalse)
+ continue;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_UNION);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ Assert(pc != NULL);
+ /*
+ * If the clause was one containing an operator named <>,
+ * we generate a special pruning steps designed to handle
+ * those, so collect it in a separate list.
+ */
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ {
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * Combine expressions from all <> operator clauses into one prune step.
+ * What we do is we convert what would originally be:
+ *
+ * ne_clause1 AND ne_clause2 .. AND ne_clauseN
+ *
+ * into:
+ *
+ * NOT (eq_clause1 OR eq_clause2 .. OR eq_clauseN)
+ *
+ * where each of the eq_clauses are constructed with valid negator of the
+ * <> operator appearing in corresponding ne_clauses.
+ */
+ if (ne_clauses != NIL)
+ {
+ List *step_ids = NIL;
+ PartitionPruneStep *unionStep,
+ *diffStep,
+ *nullpartStep;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStep *step;
+
+ /*
+ * Generate a opstep using what must be a btree = operator, that
+ * is, the negator of <> originally appearing in the clause.
+ */
+ step = (PartitionPruneStep *)
+ generate_pruning_step_op(context,
+ BTEqualStrategyNumber,
+ list_make1(pc->expr),
+ list_make1_oid(pc->cmpfn),
+ NULL);
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ /*
+ * Moreover, we must add an explicit step as an argument of the union
+ * step being built to select the NULL-only partition (if any), so
+ * that it is excluded from the final result by subsequent inversion.
+ * That's because all these <> clauses are strict and hence won't
+ * select any records of the NULL-only partition.
+ */
+ Assert(part_scheme->partnatts == 1);
+ nullpartStep = (PartitionPruneStep *)
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ bms_make_singleton(0));
+ step_ids = lappend_int(step_ids, nullpartStep->step_id);
+
+ /* Combine all opsteps above using a UNION combine step first. */
+ unionStep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_UNION);
+ /* Now add a step to invert the results. */
+ diffStep = (PartitionPruneStep *)
+ generate_pruning_step_combine(context,
+ list_make1_int(unionStep->step_id),
+ COMBINE_INVERT);
+
+ result = lappend(result, diffStep);
+ }
+
+ /*
+ * generate_opsteps set to false means no OpExprs were directly present in
+ * the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ nullkeys));
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ result = lappend(result,
+ generate_pruning_step_op(context, 0, NIL, NIL,
+ NULL));
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an AND combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_INTERSECT));
+ }
+
+ return result;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, BTORDER_PROC);
+ break;
+
+ case PARTITION_STRATEGY_HASH:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ exprtype, exprtype, HASHEXTENDED_PROC);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys, which
+ * get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ return (PartitionPruneStep *)
+ generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_INTERSECT);
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ return list_make1(generate_pruning_step_op(context, step_opstrategy,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys));
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ result = lappend(result,
+ generate_pruning_step_op(context,
+ step_opstrategy,
+ step_exprs1,
+ step_cmpfns1,
+ step_nullkeys));
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Following functions generate pruning steps of various types. Each step
+ * that's created is added to a global context->steps and receive a globally
+ * unique identifier that's sourced from context->next_step_id.
+ */
+
+static Node *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+ opstep->opstrategy = opstrategy;
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (Node *) opstep;
+}
+
+static Node *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (Node *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 30459f7ba9..155be722f6 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1865,6 +1874,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..8981901272 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -74,4 +96,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 443de22704..adb0d3a45f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -192,6 +192,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..a71d729e72 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,78 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_UNION,
+ COMBINE_INTERSECT,
+ COMBINE_INVERT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6158df68dd..7901c308f9 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index a0edba291f..0be31cce7e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -1007,24 +1009,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1032,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1111,13 +1098,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(28 rows)
-- pruning should work fine, because values for a prefix of keys (a, b) are
-- available
@@ -1275,22 +1270,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning, with values provided for both keys
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1343,3 +1332,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d2b4561530..8377671cde 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -239,3 +239,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index abc10a8ffd..86d1ccdbe5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1584,6 +1585,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1596,6 +1598,10 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
@@ -1749,6 +1755,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
--
2.11.0
v46-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v46-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 139ac3c831e0c4fe018c2e6f8f441bfbef0a96c4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v46 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 99 ++++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 111 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index b019a50a84..f5634274a9 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2291,21 +2291,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5079,9 +5064,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 765b1be74b..164eff7363 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3187,9 +3177,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2333c8df96..3f9e2585c7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2252,7 +2252,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2277,6 +2276,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2326,6 +2326,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2551,16 +2552,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4102,9 +4093,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd89c7cfee..c36a254ed6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -884,6 +884,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1337,6 +1348,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1347,7 +1364,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1374,49 +1390,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1435,9 +1457,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b387c6213b..d40029dfa7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -616,7 +616,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -631,6 +630,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1173,12 +1173,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1250,10 +1250,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1485,6 +1487,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1584,6 +1595,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1591,7 +1617,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
subpaths,
subroots,
@@ -6114,65 +6140,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9d94..058fb24927 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index adb0d3a45f..e6b5770c74 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,7 +264,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7901c308f9..20cf4c0dfa 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2127,27 +2132,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d78b..e376a81359 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -58,9 +58,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 86d1ccdbe5..8af607ee42 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1608,7 +1608,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 2 April 2018 at 17:18, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/03/31 0:55, David Rowley wrote:
explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,0);
QUERY PLAN
--------------------------------
Result (actual rows=0 loops=1)
One-Time Filter: false
(2 rows)Hmm. It is the newly added inversion step that's causing this. When
creating a generic plan (that is when the planning happens via
BuildCachedPlan called with boundParams set to NULL), the presence of
Params will cause an inversion step's source step to produce
scan-all-partitions sort of result, which the inversion step dutifully
inverts to a scan-no-partitions result.I have tried to attack that problem by handling the
no-values-to-prune-with case using a side-channel to propagate the
scan-all-partitions result through possibly multiple steps. That is, a
base pruning step will set datum_offsets in a PruneStepResult only if
pruning is carried out by actually comparing values with the partition
bounds. If no values were provided (like in the generic plan case), it
will set a scan_all_nonnull flag instead and return without setting
datum_offsets. Combine steps perform their combining duty only if
datum_offset contains a valid value, that is, if scan_all_nonnulls is not set.
I'm afraid this is still not correct :-(
The following code is not doing the right thing:
+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result,
+ * which is confirmed by the fact that cstep's step_id is
+ * greater than step_id and the fact that results of the
+ * individual steps are evaluated in sequence of their
+ * step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ result->scan_all_nonnull = step_result->scan_all_nonnull;
The last line there is not properly performing a union, it just sets
the result_scan_all_nonnull to whatever the last step's value was.
At the very least it should be |= but I don't really like this new code.
Why did you move away from just storing the matching partitions in a
Bitmapset? If you want to store all non-null partitions, then why not
just set the bits for all non-null partitions? That would cut down on
bugs like this since the combining of step results would just be
simple unions or intersects.
Also, the following code could be made a bit nicer
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result->bound_offsets = get_matching_hash_bound(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys,
+ &result->scan_all_nonnull);
Why not allocate the PruneStepResult inside the get_matching_*_bound,
that way you wouldn't need all those out parameters to set the bool
fields.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
On 2018/04/02 21:03, David Rowley wrote:
On 2 April 2018 at 17:18, Amit Langote wrote:
On 2018/03/31 0:55, David Rowley wrote:
explain (analyze, costs off, summary off, timing off) execute q1 (1,2,2,0);
QUERY PLAN
--------------------------------
Result (actual rows=0 loops=1)
One-Time Filter: false
(2 rows)Hmm. It is the newly added inversion step that's causing this. When
creating a generic plan (that is when the planning happens via
BuildCachedPlan called with boundParams set to NULL), the presence of
Params will cause an inversion step's source step to produce
scan-all-partitions sort of result, which the inversion step dutifully
inverts to a scan-no-partitions result.I have tried to attack that problem by handling the
no-values-to-prune-with case using a side-channel to propagate the
scan-all-partitions result through possibly multiple steps. That is, a
base pruning step will set datum_offsets in a PruneStepResult only if
pruning is carried out by actually comparing values with the partition
bounds. If no values were provided (like in the generic plan case), it
will set a scan_all_nonnull flag instead and return without setting
datum_offsets. Combine steps perform their combining duty only if
datum_offset contains a valid value, that is, if scan_all_nonnulls is not set.I'm afraid this is still not correct :-(
The following code is not doing the right thing:
+ case COMBINE_UNION: + foreach(lc1, cstep->source_stepids) + { + int step_id = lfirst_int(lc1); + PruneStepResult *step_result; + + /* + * step_results[step_id] must contain a valid result, + * which is confirmed by the fact that cstep's step_id is + * greater than step_id and the fact that results of the + * individual steps are evaluated in sequence of their + * step_ids. + */ + if (step_id >= cstep->step.step_id) + elog(ERROR, "invalid pruning combine step argument"); + step_result = step_results[step_id]; + Assert(step_result != NULL); + + result->scan_all_nonnull = step_result->scan_all_nonnull;The last line there is not properly performing a union, it just sets
the result_scan_all_nonnull to whatever the last step's value was.>
At the very least it should be |= but I don't really like this new code.Why did you move away from just storing the matching partitions in a
Bitmapset? If you want to store all non-null partitions, then why not
just set the bits for all non-null partitions? That would cut down on
bugs like this since the combining of step results would just be
simple unions or intersects.
As I mentioned in my previous email, I had to find a side-channel (that is
scan_all_nonnull) to store this information instead of doing it the
regular way, to differentiate the case where we need to scan all
partitions because of values in the base prune steps not being available
from the case where carrying out a step using actual values ends up
selecting all partitions. When creating a generic plan, values of none of
the Params that are added to base prune steps are available and that
results in reaching the actual pruning functions
(get_matching_hash/list/range_bounds) without any values, which results in
each of those functions, in returning all partitions containing non-null data.
But actually, the presence of only Params in the pruning steps should
result in the pruning not being invoked at all (at least for the static
pruning case), thus selecting all partitions containing non-null data. It
is better to implement that instead of a workaround like scan_all_nonnulls
side-channel I was talking about.
Fixed the patch to implement it that way.
Also, the following code could be made a bit nicer
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult)); + + switch (context->strategy) + { + case PARTITION_STRATEGY_HASH: + result->bound_offsets = get_matching_hash_bound(context, + opstep->opstrategy, + values, nvalues, + partsupfunc, + opstep->nullkeys, + &result->scan_all_nonnull);Why not allocate the PruneStepResult inside the get_matching_*_bound,
that way you wouldn't need all those out parameters to set the bool
fields.
I thought it'd be nice to have perform_pruning_base_step generate the
actual PruneStepResult instead of the functions for individual
partitioning strategies, which in a way minimizes places where it is
manipulated. Since, we've divided bound searching into 3 separate
functions anyway, it also seemed better to me to have their signatures be
relevant to the partition strategy they cater to. The function for hash
partitioning, for example, never has to deal with setting the result for
the null or the default partition and the range partitioning function
doesn't have to worry about doing anything about for null partition.
Also, overall footprint of those 3 functions reduced because they don't
have to create the PruneStepResult themselves.
Attached v47.
Thanks,
Amit
Attachments:
v47-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v47-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 20a524d50895be0fb79659e1cbf9bc7c4f7173f6 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v47 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8a6baa7bea..b46b33d4f7 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1881,7 +1881,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1899,7 +1900,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1917,6 +1918,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1951,6 +1965,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a2dde70de5..83b03b41e4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v47-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v47-0002-Add-more-tests-for-partition-pruning.patchDownload
From 1e10d563349e01b7248caf0a18f739606dd5eb22 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v47 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 258 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 88 ++++++++-
2 files changed, 344 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..a0edba291f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,260 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..d2b4561530 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,90 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v47-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v47-0003-Faster-partition-pruning.patchDownload
From d18b8da1ea9847d0386729a024f50e534c45eef3 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v47 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1176 +++++++++++++++++
src/backend/nodes/copyfuncs.c | 36 +
src/backend/nodes/nodeFuncs.c | 25 +
src/backend/nodes/outfuncs.c | 28 +
src/backend/nodes/readfuncs.c | 30 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1757 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 3 +
src/include/nodes/primnodes.h | 74 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 282 +++-
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 7 +
20 files changed, 3525 insertions(+), 73 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..823b818f80 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,23 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * The following struct describes the result of performing one
+ * PartitionPruneStep.
+ */
+typedef struct PruneStepResult
+{
+ /*
+ * This contains the offsets of bounds (of those in a table's boundinfo,
+ * each of which is a bound whose corresponding partition is selected by a
+ * given pruning step.
+ */
+ Bitmapset *bound_offsets;
+
+ /* Set if we need to scan the default and/or the null partition, resp. */
+ bool scan_default;
+ bool scan_null;
+} PruneStepResult;
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -197,6 +214,25 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static Bitmapset *get_matching_hash_bound(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static Bitmapset *get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_null, bool *scan_default);
+static Bitmapset *get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_default);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1620,9 +1656,1149 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **step_results,
+ *last_step_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. The
+ * result of applying all pruning steps is the value contained in the slot
+ * of the last pruning step.
+ */
+ step_results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ step_results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ step_results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ step_results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know the offsets of all the datums whose corresponding
+ * partitions need to be in the result, including special null-accepting
+ * and default partitions. Collect the actual partition indexes now.
+ */
+ last_step_result = step_results[num_steps - 1];
+ Assert(last_step_result != NULL);
+ i = -1;
+ result = NULL;
+ while ((i = bms_next_member(last_step_result->bound_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed and if present. */
+ if (last_step_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ if (partition_bound_accepts_nulls(context->boundinfo))
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (last_step_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ if (partition_bound_has_default(context->boundinfo))
+ result = bms_add_member(result,
+ context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ PruneStepResult *result;
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /*
+ * There better be the same number of expressions and compare functions.
+ */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * the get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ fmgr_info_copy(&partsupfunc[keyno],
+ &context->partsupfunc[keyno],
+ CurrentMemoryContext);
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ result->bound_offsets = get_matching_hash_bound(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+ /*
+ * Since there are neither of the special partitions (null and
+ * default) in case of hash partitioning, scan_null and
+ * scan_default are not set.
+ */
+ result->scan_null = result->scan_default = false;
+ break;
+
+ case PARTITION_STRATEGY_LIST:
+ result->bound_offsets = get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys,
+ &result->scan_null,
+ &result->scan_default);
+ break;
+
+ case PARTITION_STRATEGY_RANGE:
+ result->bound_offsets = get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys,
+ &result->scan_default);
+ /* There is no special null-accepting range partition. */
+ result->scan_null = false;
+ break;
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ break;
+ }
+
+ return result;
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids using the specified
+ * combination method
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+
+ /*
+ * In some cases, the planner generates a combine step that doesn't
+ * contain any argument steps, to signal us to not prune any partitions.
+ * So, return indexes of all datums in that case, including null and/or
+ * default partition, if any.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (list_length(cstep->source_stepids) == 0)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->bound_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+ else
+ {
+ bool firststep;
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result,
+ * which is confirmed by the fact that cstep's step_id is
+ * greater than step_id and the fact that results of the
+ * individual steps are evaluated in sequence of their
+ * step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ /*
+ * Set bound_offsets. A bound's offset will be added to
+ * the results if it is present in either the source or
+ * the target.
+ */
+ result->bound_offsets =
+ bms_add_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
+
+ case COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->bound_offsets = step_result->bound_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /*
+ * Set bound_offsets. A bound's offset will be added
+ * to the results only if it is present in both the
+ * source and the target.
+ */
+ result->bound_offsets =
+ bms_int_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /*
+ * Update whether to scan null and default partitions.
+ */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default =
+ step_result->scan_default;
+ }
+ }
+ break;
+
+ case COMBINE_INVERT:
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int source_step_id;
+ PruneStepResult *source;
+
+ /*
+ * There should only ever be one source step to invert the
+ * result of.
+ */
+ Assert(list_length(cstep->source_stepids) == 1);
+ source_step_id = linitial_int(cstep->source_stepids);
+ if (source_step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ source = step_results[source_step_id];
+ Assert(source != NULL);
+
+ /*
+ * Set bound_offsets. A bound's offset will be added to
+ * the results only if it is present in target but not in
+ * source.
+ */
+
+ /* First add all possible datum offsets. */
+ result->bound_offsets =
+ bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ /* Remove from it the members present in source. */
+ result->bound_offsets =
+ bms_del_members(result->bound_offsets,
+ source->bound_offsets);
+
+ /*
+ * Revert whether to scan the null partition as the source
+ * steps would've determined it.
+ */
+ Assert(!source->scan_null ||
+ partition_bound_accepts_nulls(boundinfo));
+ result->scan_null = !source->scan_null;
+
+ /*
+ * Unlike other partitions, the set of values contained in
+ * the default partition is unspecified, so it does not
+ * make sense to determine whether or not to scan it by
+ * simply inverting what the source step would've decided.
+ * That's because the boundinfo does not explicitly
+ * contain the datums corresponding to the default
+ * partition. In fact, we should *always* scan the
+ * default partition in this case, because the set of
+ * datums after inversion, other that those that have a
+ * non-default partition defined, would still contain
+ * datums of the partition key's type that could only be
+ * in the default partition.
+ *
+ * XXX - the above reasoing only seems to apply if the
+ * table is list partitioned. Maybe we should Assert that
+ * it is. Currently, we generate a combine step with
+ * the inversion op only for a case that's supported for
+ * list partitioning.
+ */
+ result->scan_default = true;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_matching_hash_bound
+ * Determine offset of the hash bound matching the specified value,
+ * considering that all the non-null values come from clauses containing
+ * a compatible hash eqaulity operator and any keys that are null come
+ * from an IS NULL clause
+ *
+ * In most cases, the result would contain just one bound's offset, although
+ * the set may be empty if the corresponding hash partition has not been
+ * created.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_matching_hash_bound(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we got values for all keys.
+ */
+ if (nvalues + bms_num_members(nullkeys) == partnatts)
+ {
+ /*
+ * If there are any values, they must have come from clauses
+ * containing an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ return bms_make_singleton(rowHash % greatest_modulus);
+ }
+
+ return bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+}
+
+/*
+ * get_matching_list_bounds
+ * Determine the offsets of list bounds matching the specified value,
+ * according to the semantics of the given operator strategy
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
+
+ * 'value' contains the value to use for pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the list partitioning comparison function to be used
+ * to perform partition_list_bsearch
+
+ * 'nullkeys' is the set of partition keys that are null.
+ *
+ * '*scan_null' is set if the special null-accepting partition should be
+ * scanned
+ *
+ * '*scan_default' is set if the special default partition should be scanned
+ */
+static Bitmapset *
+get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_null, bool *scan_default)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ *scan_null = *scan_default = false;
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ *scan_null = true;
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber &&
+ partition_bound_has_default(boundinfo))
+ *scan_default = true;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions.
+ */
+ if (nvalues == 0)
+ return bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[off] >= 0);
+ return bms_make_singleton(off);
+ }
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partitions satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return NULL;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return NULL;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ return bms_add_range(NULL, minoff, maxoff);
+}
+
+/*
+ * get_matching_range_datums
+ * Determine the offsets of range bounds matching the specified values,
+ * according to the semantics of the given operator strategy
+ *
+ * Each datum whose offset is in result is to be treated as the upper bound of
+ * the partition that will contain the desired values.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'nvalues', if non-zero, should be <= context->partnatts - 1
+
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the range partitioning comparison functions to be
+ * used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * using.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static Bitmapset *
+get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys,
+ bool *scan_default)
+{
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(nvalues <= partnatts);
+
+ *scan_default = false;
+
+ /*
+ * If there are no datums to compare keys with, or if we got an IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+
+ return bms_add_range(NULL, minoff, maxoff);
+ }
+
+ /*
+ * If the query does not constrain all key columns, we'll need to scan the
+ * the default partition, if any.
+ */
+ if (nvalues < partnatts && partition_bound_has_default(boundinfo))
+ *scan_default = true;
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ /*
+ * Look for the smallest bound that is = look-up value.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partition. */
+ if (partindices[off + 1] >= 0)
+ return bms_make_singleton(off + 1);
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ */
+
+ /*
+ * First find greatest bound that's smaller than the
+ * look-up value.
+ */
+ while (off >= 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+ /*
+ * We can treat off as the offset of the smallest bound to
+ * be included in the result, if we know it is the upper
+ * bound of the partition in which the look-up value could
+ * possibly exist. One case it couldn't is if the bound,
+ * or precisely the matched portion of its prefix, is not
+ * inclusive.
+ */
+ if (boundinfo->kind[off][nvalues] ==
+ PARTITION_RANGE_DATUM_MINVALUE)
+ off++;
+
+ minoff = off;
+
+ /*
+ * Now find smallest bound that's greater than the look-up
+ * value.
+ */
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * off + 1, then would be the offset of the greatest bound
+ * to be included in the result.
+ */
+ maxoff = off + 1;
+ }
+
+ /*
+ * Skip if minoff/maxoff are actually the upper bound of a
+ * un-assigned portion of values.
+ */
+ if (partindices[minoff] < 0 && minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ *scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ return bms_add_range(NULL, minoff, maxoff);
+ }
+ else if (off >= 0) /* !is_equal */
+ {
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * only partition that may contain the look-up value.
+ */
+ if (partindices[off + 1] >= 0)
+ return bms_make_singleton(off + 1);
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+ /*
+ * off < 0, meaning the look-up value is smaller that all bounds,
+ * so only the default partition, if any, qualifies.
+ */
+ else if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ /*
+ * Look for the smallest bound that is > or >= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the look-up value, so include
+ * all of them in the result.
+ */
+ minoff = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ *
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes of all such
+ * bounds in the result (that is, set minoff to the index
+ * of smallest such bound) or find the smallest one that's
+ * greater than the look-up value and set minoff to that.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ minoff = inclusive ? off : off + 1;
+ }
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * smallest partition that may contain the look-up value.
+ */
+ else
+ minoff = off + 1;
+ }
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ /*
+ * Look for the greatest bound that is < or <= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the key, so we could only
+ * expect to find the look-up key in the default partition.
+ */
+ if (partition_bound_has_default(boundinfo))
+ *scan_default = true;
+ return NULL;
+ }
+ else
+ {
+ /*
+ * See the comment above.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ maxoff = inclusive ? off + 1: off;
+ }
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * greatest partition that may contain look-up value. If
+ * the look-up value had exactly matched the bound, but it
+ * isn't inclusive, no need add the adjacent partition.
+ */
+ else if (!is_equal || inclusive)
+ maxoff = off + 1;
+ else
+ maxoff = off;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition, as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ *scan_default = true;
+ }
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE &&
+ partition_bound_has_default(boundinfo))
+ {
+ *scan_default = true;
+ }
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ *scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (minoff > maxoff)
+ return NULL;
+ return bms_add_range(NULL, minoff, maxoff);
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 770ed3b1a8..96eff92619 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2136,6 +2136,36 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5058,6 +5088,12 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 3c302db057..0a916609bc 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2156,6 +2156,17 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2958,6 +2969,20 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c8d962670e..efd0a71a2c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1710,6 +1710,28 @@ _outFromExpr(StringInfo str, const FromExpr *node)
}
static void
+_outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPOP");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_INT_FIELD(opstrategy);
+ WRITE_NODE_FIELD(exprs);
+ WRITE_NODE_FIELD(cmpfns);
+ WRITE_BITMAPSET_FIELD(nullkeys);
+}
+
+static void
+_outPartitionPruneStepCombine(StringInfo str, const PartitionPruneStepCombine *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPCOMBINE");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ WRITE_NODE_FIELD(source_stepids);
+}
+
+static void
_outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
{
WRITE_NODE_TYPE("ONCONFLICTEXPR");
@@ -3958,6 +3980,12 @@ outNode(StringInfo str, const void *obj)
case T_OnConflictExpr:
_outOnConflictExpr(str, obj);
break;
+ case T_PartitionPruneStepOp:
+ _outPartitionPruneStepOp(str, obj);
+ break;
+ case T_PartitionPruneStepCombine:
+ _outPartitionPruneStepCombine(str, obj);
+ break;
case T_Path:
_outPath(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4518fa0cdb..25874074a0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1331,6 +1331,32 @@ _readOnConflictExpr(void)
READ_DONE();
}
+static PartitionPruneStepOp *
+_readPartitionPruneStepOp(void)
+{
+ READ_LOCALS(PartitionPruneStepOp);
+
+ READ_INT_FIELD(step.step_id);
+ READ_INT_FIELD(opstrategy);
+ READ_NODE_FIELD(exprs);
+ READ_NODE_FIELD(cmpfns);
+ READ_BITMAPSET_FIELD(nullkeys);
+
+ READ_DONE();
+}
+
+static PartitionPruneStepCombine *
+_readPartitionPruneStepCombine(void)
+{
+ READ_LOCALS(PartitionPruneStepCombine);
+
+ READ_INT_FIELD(step.step_id);
+ READ_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ READ_NODE_FIELD(source_stepids);
+
+ READ_DONE();
+}
+
/*
* Stuff from parsenodes.h.
*/
@@ -2596,6 +2622,10 @@ parseNodeString(void)
return_value = _readFromExpr();
else if (MATCH("ONCONFLICTEXPR", 14))
return_value = _readOnConflictExpr();
+ else if (MATCH("PARTITIONPRUNESTEPOP", 20))
+ return_value = _readPartitionPruneStepOp();
+ else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
+ return_value = _readPartitionPruneStepCombine();
else if (MATCH("RTE", 3))
return_value = _readRangeTblEntry();
else if (MATCH("RANGETBLFUNCTION", 16))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c4e4db15a6..fd89c7cfee 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -874,6 +875,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -881,6 +884,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1128,6 +1145,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..6ab81aca1e
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1757 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to partition pruning "steps"
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning", as all the needed
+ * information is statically available in the query being planned. Otherwise,
+ * they'd need to be delivered to the executor where the missing information
+ * can be filled and pruning tried one more time, which would be called
+ * "dynamic pruning".
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ bool static_pruning;
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static PartitionPruneStep *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static PartitionPruneStep *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses, true,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If static_pruning is true, include in the result only steps that contain at
+ * least one Const. If any of the clause in the input list is a
+ * pseudo-constant "false", *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool static_pruning, bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.static_pruning = static_pruning;
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the global array context->steps and
+ * each one gets an identifier which is unique across all recursive
+ * invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. The caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS],
+ *ne_clauses = NIL;
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * indepdently, collect their step IDs to be stored in the combine
+ * step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps either means that arg_constfalse is true
+ * or the arg didn't contain a clause matching this
+ * partition key.
+ *
+ * In case of the latter, we cannot prune using such
+ * an arg. To indicate that to the pruning code, we
+ * must construct a dummy PartitionPruneStepCombine
+ * whose source_stepids is set to to an empty List.
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ /* Just ignore this argument. */
+ if (arg_constfalse)
+ continue;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_UNION);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ if (arg_stepids != NIL)
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ if (arg_stepids)
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_INTERSECT));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which is handled in
+ * match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false,
+ is_neop_listp;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps,
+ &is_neop_listp))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+
+ Assert(pc != NULL);
+ /*
+ * If the clause was one containing an operator named <>,
+ * we generate a special pruning steps designed to handle
+ * those, so collect it in a separate list.
+ */
+ if (is_neop_listp)
+ ne_clauses = lappend(ne_clauses, pc);
+ else
+ {
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * Combine expressions from all <> operator clauses into one prune step.
+ * What we do is we convert what would originally be:
+ *
+ * ne_clause1 AND ne_clause2 .. AND ne_clauseN
+ *
+ * into:
+ *
+ * NOT (eq_clause1 OR eq_clause2 .. OR eq_clauseN)
+ *
+ * where each of the eq_clauses are constructed with valid negator of the
+ * <> operator appearing in corresponding ne_clauses.
+ */
+ if (ne_clauses != NIL)
+ {
+ List *step_ids = NIL;
+ PartitionPruneStep *unionStep,
+ *invertStep,
+ *nullpartStep;
+
+ Assert(part_scheme->strategy == PARTITION_STRATEGY_LIST);
+ foreach(lc, ne_clauses)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStep *step;
+
+ /*
+ * Generate a opstep using what must be a btree = operator, that
+ * is, the negator of <> originally appearing in the clause.
+ */
+ step = generate_pruning_step_op(context, BTEqualStrategyNumber,
+ list_make1(pc->expr),
+ list_make1_oid(pc->cmpfn),
+ NULL);
+ if (step != NULL)
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ /*
+ * Moreover, we must add an explicit step as an argument of the union
+ * step being built to select the NULL-only partition (if any), so
+ * that it is excluded from the final result by subsequent inversion.
+ * That's because all these <> clauses are strict and hence won't
+ * select any records of the NULL-only partition.
+ */
+ Assert(part_scheme->partnatts == 1);
+ nullpartStep = generate_pruning_step_op(context, 0, NIL, NIL,
+ bms_make_singleton(0));
+ if (nullpartStep != NULL)
+ step_ids = lappend_int(step_ids, nullpartStep->step_id);
+
+ /* Combine all opsteps above using a UNION combine step first. */
+ if (step_ids != NIL)
+ {
+ unionStep = generate_pruning_step_combine(context, step_ids,
+ COMBINE_UNION);
+ /* Now add a step to invert the results. */
+ invertStep = generate_pruning_step_combine(context,
+ list_make1_int(unionStep->step_id),
+ COMBINE_INVERT);
+ result = lappend(result, invertStep);
+ }
+ }
+
+ /*
+ * generate_opsteps set to false means no OpExprs were directly present in
+ * the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ {
+ PartitionPruneStep *step;
+
+ step = generate_pruning_step_op(context, 0, NIL, NIL, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ {
+ PartitionPruneStep *step;
+
+ step = generate_pruning_step_op(context, 0, NIL, NIL, NULL);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an AND combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ if (step_ids != NIL)
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_INTERSECT));
+ }
+
+ return result;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause and *is_neop_listp set if the clause contained a <>
+ * operator
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps,
+ bool *is_neop_listp)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *is_neop_listp = false;
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ *is_neop_listp = false;
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ *is_neop_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!*is_neop_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, BTORDER_PROC);
+ break;
+
+ case PARTITION_STRATEGY_HASH:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ exprtype, exprtype, HASHEXTENDED_PROC);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ if (*is_neop_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys, which
+ * get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ if (opstep_ids != NIL)
+ return generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_INTERSECT);
+ return NULL;
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStep *step;
+
+ step = generate_pruning_step_op(context, step_opstrategy,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys);
+ return step != NULL ? list_make1(step) : NIL;
+ }
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+ PartitionPruneStep *step;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ step = generate_pruning_step_op(context, step_opstrategy,
+ step_exprs1, step_cmpfns1,
+ step_nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Following functions generate pruning steps of various types. Each step
+ * that's created is added to a global context->steps and receive a globally
+ * unique identifier that's sourced from context->next_step_id.
+ */
+
+static PartitionPruneStep *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ bool contains_const = false;
+ PartitionPruneStepOp *opstep;
+
+ /*
+ * For static pruning, we require there to be present at least one
+ * constant to pass to the pruning code.
+ */
+ foreach(lc, exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, Const))
+ {
+ contains_const = true;
+ break;
+ }
+ }
+
+ if (exprs != NIL && !contains_const && context->static_pruning)
+ return NULL;
+
+ opstep = makeNode(PartitionPruneStepOp);
+ opstep->step.step_id = context->next_step_id++;
+ opstep->opstrategy = opstrategy;
+ Assert(list_length(exprs) == list_length(cmpfns));
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (PartitionPruneStep *) opstep;
+}
+
+static PartitionPruneStep *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (PartitionPruneStep *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b46b33d4f7..32e973385d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1869,6 +1878,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..8981901272 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -74,4 +96,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fce48026b6..4df979e9eb 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -193,6 +193,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..a71d729e72 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,78 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_UNION,
+ COMBINE_INTERSECT,
+ COMBINE_INVERT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 83b03b41e4..9b9aabddef 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..fb2f4b80fc
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool static_pruning, bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index a0edba291f..0be31cce7e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -1007,24 +1009,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1032,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1111,13 +1098,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(28 rows)
-- pruning should work fine, because values for a prefix of keys (a, b) are
-- available
@@ -1275,22 +1270,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning, with values provided for both keys
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1343,3 +1332,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d2b4561530..8377671cde 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -239,3 +239,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6d8a44cd9e..aa2ec281c4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1587,6 +1588,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1599,6 +1601,10 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
@@ -1752,6 +1758,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
--
2.11.0
v47-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v47-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 396c3cc0b53ad345cb2b43e5d8c04d4928d636b4 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v47 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 99 ++++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 111 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 96eff92619..c36de32521 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2294,21 +2294,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5113,9 +5098,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 5a0151eece..68cc1eee32 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3213,9 +3203,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index efd0a71a2c..e6793b4716 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2271,7 +2271,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2296,6 +2295,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2345,6 +2345,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2570,16 +2571,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4127,9 +4118,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd89c7cfee..c36a254ed6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -884,6 +884,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1337,6 +1348,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1347,7 +1364,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1374,49 +1390,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1435,9 +1457,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 15c8d34c70..008492bad5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -616,7 +616,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -631,6 +630,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1191,12 +1191,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1268,10 +1268,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1503,6 +1505,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1603,6 +1614,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1610,7 +1636,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
0,
subpaths,
@@ -6145,65 +6171,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 5236ab378e..67e47887fc 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4df979e9eb..1ec8030d4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -265,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9b9aabddef..afe1faf2ea 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2130,27 +2135,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 07a3bc0627..c090396e13 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,9 +59,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index aa2ec281c4..adde8eaee9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1611,7 +1611,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 4 April 2018 at 00:02, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
But actually, the presence of only Params in the pruning steps should
result in the pruning not being invoked at all (at least for the static
pruning case), thus selecting all partitions containing non-null data. It
is better to implement that instead of a workaround like scan_all_nonnulls
side-channel I was talking about.
I don't think this is quite true. Since we're only using strict
clauses, a list of quals with just Params still means that NULLs can't
match. If you skip the step altogether then won't you have you've lost
the chance at pruning away any NULL-only partition?
I think it would be better to just have special handling in
get_matching_list_bound so that it knows it's performing <>
elimination. I'd thought about passing some other opstrategy but the
only safe one I thought to use was InvalidStrategy, which is already
used by NULL handling.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 4 April 2018 at 09:47, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 4 April 2018 at 00:02, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
But actually, the presence of only Params in the pruning steps should
result in the pruning not being invoked at all (at least for the static
pruning case), thus selecting all partitions containing non-null data. It
is better to implement that instead of a workaround like scan_all_nonnulls
side-channel I was talking about.I don't think this is quite true. Since we're only using strict
clauses, a list of quals with just Params still means that NULLs can't
match. If you skip the step altogether then won't you have you've lost
the chance at pruning away any NULL-only partition?I think it would be better to just have special handling in
get_matching_list_bound so that it knows it's performing <>
elimination. I'd thought about passing some other opstrategy but the
only safe one I thought to use was InvalidStrategy, which is already
used by NULL handling.
I'm currently working up a patch to do this the way I think is best.
I'll submit it soon and we can review and get your thoughts on it.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 4 April 2018 at 11:22, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 4 April 2018 at 09:47, David Rowley <david.rowley@2ndquadrant.com> wrote:
I think it would be better to just have special handling in
get_matching_list_bound so that it knows it's performing <>
elimination. I'd thought about passing some other opstrategy but the
only safe one I thought to use was InvalidStrategy, which is already
used by NULL handling.I'm currently working up a patch to do this the way I think is best.
I'll submit it soon and we can review and get your thoughts on it.
I've attached a rough cut version of what I think is a good solution
for this. It's based on v46, not your latest v47, sorry.
This makes get_matching_list_bounds() aware that it's performing the
not-equal pruning via the opstrategy which allows it to not return all
partitions when there are no values in this case. Instead, we return
the NULL partition, so that we later invert that and return everything
apart from the NULL partition. A strict clause will allow us that
much, even if we can't get the actual value being compared to, at the
time.
There's also a bunch of other changes in there:
1. Adding missing step_id in copyfuncs.c
2. Simplified including the default partition in a bunch of cases.
3. Made it so scan_default and scan_null are only ever set to true if
there's a partition for that.
4. Changed get_matching_*_bounds to return the entire result struct
instead of the Bitmapset and pass the remaining bool values back
through params. I didn't really like how you'd change this to pass all
the bool flags back as params. There's a perfectly good struct there
to provide the entire result in a single return value. I know you've
disagreed with this already, so would be nice to get a 3rd opinion.
5. Rename get_matching_hash_bound to get_matching_hash_bounds. The
LIST and RANGE version of this function both had a plural name. I
didn't see any reason for the hash case to be different.
Let me know what you think.
I've patched the run-time pruning v18 against this and it now passes regression.
I need to do a bit more testing on this to ensure it works for all
cases, but thought I'd send now as I suspect you're currently around
to look.
There might be another issue with the patch too, but I'll send a
separate email about that.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v46_fixes_drowley.patchapplication/octet-stream; name=v46_fixes_drowley.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 2a088b416b..5a5f6b0f5f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -153,13 +153,6 @@ typedef struct PruneStepResult
*/
Bitmapset *bound_offsets;
- /*
- * Set if we need to scan all partitions that contain non-null data; if
- * this is set, bound_offsets should be NULL and its value should not be
- * relied upon.
- */
- bool scan_all_nonnull;
-
/* Set if we need to scan the default and/or the null partition, resp. */
bool scan_default;
bool scan_null;
@@ -230,19 +223,16 @@ static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *cont
PruneStepResult **step_results);
static bool partkey_datum_from_expr(PartitionPruneContext *context,
Expr *expr, Datum *value);
-static Bitmapset *get_matching_hash_bound(PartitionPruneContext *context,
- int opstrategy, Datum *values, int nvalues,
- FmgrInfo *partsupfunc, Bitmapset *nullkeys,
- bool *scan_all_nonnull);
-static Bitmapset *get_matching_list_bounds(PartitionPruneContext *context,
+static PruneStepResult *get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
int opstrategy, Datum value, int nvalues,
FmgrInfo *partsupfunc, Bitmapset *nullkeys,
- bool *scan_null, bool *scan_default,
- bool *scan_all_nonnull);
-static Bitmapset *get_matching_range_bounds(PartitionPruneContext *context,
+ Bitmapset *notnullkeys);
+static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
int opstrategy, Datum *values, int nvalues,
- FmgrInfo *partsupfunc, Bitmapset *nullkeys,
- bool *scan_default, bool *scan_all_nonnull);
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
/*
* RelationBuildPartitionDesc
@@ -1677,12 +1667,11 @@ Bitmapset *
get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps)
{
- Bitmapset *result;
+ Bitmapset *partitions;
int num_steps = list_length(pruning_steps),
i;
- PruneStepResult **step_results,
- *last_step_result;
- Bitmapset *bound_offsets;
+ PruneStepResult **results,
+ *final_result;
ListCell *lc;
/* If there are no pruning steps then all partitions match. */
@@ -1696,8 +1685,8 @@ get_matching_partitions(PartitionPruneContext *context,
* result of applying all pruning steps is the value contained in the slot
* of the last pruning step.
*/
- step_results = (PruneStepResult **)
- palloc0(num_steps * sizeof(PruneStepResult *));
+ results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
foreach(lc, pruning_steps)
{
PartitionPruneStep *step = lfirst(lc);
@@ -1705,18 +1694,14 @@ get_matching_partitions(PartitionPruneContext *context,
switch (nodeTag(step))
{
case T_PartitionPruneStepOp:
- step_results[step->step_id] =
- perform_pruning_base_step(context,
+ results[step->step_id] = perform_pruning_base_step(context,
(PartitionPruneStepOp *) step);
- Assert(!step_results[step->step_id]->scan_all_nonnull ||
- step_results[step->step_id]->bound_offsets == NULL);
break;
case T_PartitionPruneStepCombine:
- step_results[step->step_id] =
- perform_pruning_combine_step(context,
- (PartitionPruneStepCombine *) step,
- step_results);
+ results[step->step_id] = perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ results);
break;
default:
@@ -1730,16 +1715,10 @@ get_matching_partitions(PartitionPruneContext *context,
* partitions need to be in the result, including special null-accepting
* and default partitions. Collect the actual partition indexes now.
*/
- last_step_result = step_results[num_steps - 1];
- Assert(last_step_result != NULL);
- if (last_step_result->scan_all_nonnull)
- bound_offsets = bms_add_range(NULL, 0,
- context->boundinfo->ndatums - 1);
- else
- bound_offsets = last_step_result->bound_offsets;
i = -1;
- result = NULL;
- while ((i = bms_next_member(bound_offsets, i)) >= 0)
+ partitions = NULL;
+ final_result = results[num_steps - 1];
+ while ((i = bms_next_member(final_result->bound_offsets, i)) >= 0)
{
int partindex = context->boundinfo->indexes[i];
@@ -1752,26 +1731,28 @@ get_matching_partitions(PartitionPruneContext *context,
* the query's conditions.
*/
if (partindex >= 0)
- result = bms_add_member(result, partindex);
+ partitions = bms_add_member(partitions, partindex);
}
/* Add the null and/or default partition if needed and if present. */
- if (last_step_result->scan_null)
+ if (final_result->scan_null)
{
Assert(context->strategy == PARTITION_STRATEGY_LIST);
- if (partition_bound_accepts_nulls(context->boundinfo))
- result = bms_add_member(result, context->boundinfo->null_index);
+ Assert(partition_bound_accepts_nulls(context->boundinfo));
+ partitions = bms_add_member(partitions,
+ context->boundinfo->null_index);
}
- if (last_step_result->scan_default)
+ if (final_result->scan_default)
{
Assert(context->strategy == PARTITION_STRATEGY_LIST ||
context->strategy == PARTITION_STRATEGY_RANGE);
- if (partition_bound_has_default(context->boundinfo))
- result = bms_add_member(result,
+ Assert(partition_bound_has_default(context->boundinfo));
+
+ partitions = bms_add_member(partitions,
context->boundinfo->default_index);
}
- return result;
+ return partitions;
}
/* Module-local functions */
@@ -1788,7 +1769,6 @@ static PruneStepResult *
perform_pruning_base_step(PartitionPruneContext *context,
PartitionPruneStepOp *opstep)
{
- PruneStepResult *result;
ListCell *lc1,
*lc2;
int keyno,
@@ -1859,55 +1839,35 @@ perform_pruning_base_step(PartitionPruneContext *context,
}
}
- result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
-
switch (context->strategy)
{
case PARTITION_STRATEGY_HASH:
- result->bound_offsets = get_matching_hash_bound(context,
- opstep->opstrategy,
- values, nvalues,
- partsupfunc,
- opstep->nullkeys,
- &result->scan_all_nonnull);
- /*
- * Since there are neither of the special partitions (null and
- * default) in case of hash partitioning, scan_null and
- * scan_default are not set.
- */
- result->scan_null = result->scan_default = false;
- break;
+ return get_matching_hash_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
case PARTITION_STRATEGY_LIST:
- result->bound_offsets = get_matching_list_bounds(context,
- opstep->opstrategy,
- values[0], nvalues,
- &partsupfunc[0],
- opstep->nullkeys,
- &result->scan_null,
- &result->scan_default,
- &result->scan_all_nonnull);
- break;
+ return get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys,
+ opstep->notnullkeys);
case PARTITION_STRATEGY_RANGE:
- result->bound_offsets = get_matching_range_bounds(context,
- opstep->opstrategy,
- values, nvalues,
- partsupfunc,
- opstep->nullkeys,
- &result->scan_default,
- &result->scan_all_nonnull);
- /* There is no special null-accepting range partition. */
- result->scan_null = false;
- break;
+ return get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
default:
elog(ERROR, "unexpected partition strategy: %d",
(int) context->strategy);
- break;
+ return NULL; /* keep compiler quiet */
}
-
- return result;
}
/*
@@ -1938,8 +1898,7 @@ perform_pruning_combine_step(PartitionPruneContext *context,
{
PartitionBoundInfo boundinfo = context->boundinfo;
- result->bound_offsets = NULL;
- result->scan_all_nonnull = true;
+ result->bound_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
result->scan_default = partition_bound_has_default(boundinfo);
result->scan_null = partition_bound_accepts_nulls(boundinfo);
return result;
@@ -1968,22 +1927,13 @@ perform_pruning_combine_step(PartitionPruneContext *context,
step_result = step_results[step_id];
Assert(step_result != NULL);
- result->scan_all_nonnull = step_result->scan_all_nonnull;
-
- /*
- * Set bound_offsets if required. A bound's offset will
- * be added to the results if it is present in either the
- * source or the target.
- */
- if (!result->scan_all_nonnull)
- result->bound_offsets =
- bms_add_members(result->bound_offsets,
+ /* Record any additional datum indexes from this step */
+ result->bound_offsets =
+ bms_add_members(result->bound_offsets,
step_result->bound_offsets);
/* Update whether to scan null and default partitions. */
- if (!result->scan_null)
- result->scan_null = step_result->scan_null;
- if (!result->scan_default)
- result->scan_default = step_result->scan_default;
+ result->scan_null |= step_result->scan_null;
+ result->scan_default |= step_result->scan_default;
}
break;
@@ -2005,19 +1955,12 @@ perform_pruning_combine_step(PartitionPruneContext *context,
result->bound_offsets = step_result->bound_offsets;
result->scan_null = step_result->scan_null;
result->scan_default = step_result->scan_default;
- result->scan_all_nonnull =
- step_result->scan_all_nonnull;
firststep = false;
}
else
{
- /*
- * Set bound_offsets if required. A bound's offset
- * will be added to the results only if it is present
- * in both the source and the target.
- */
- if (!step_result->scan_all_nonnull)
- result->bound_offsets =
+ /* Record datum indexes common to both steps */
+ result->bound_offsets =
bms_int_members(result->bound_offsets,
step_result->bound_offsets);
/*
@@ -2049,32 +1992,20 @@ perform_pruning_combine_step(PartitionPruneContext *context,
source = step_results[source_step_id];
Assert(source != NULL);
- result->scan_all_nonnull = source->scan_all_nonnull;
-
- /*
- * Set bound_offsets if required. A bound's offset
- * will be added to the results only if it is present
- * in target but not the source.
- */
- if (!result->scan_all_nonnull)
- {
- /* First add all possible datum offsets. */
- result->bound_offsets =
- bms_add_range(NULL, 0,
- boundinfo->ndatums - 1);
- /* Remove from it the members present in source. */
- result->bound_offsets =
+ /* Generate bitwise-NOT of members. */
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ result->bound_offsets =
bms_del_members(result->bound_offsets,
source->bound_offsets);
- }
/*
- * Revert whether to scan the null partition as the source
- * steps would've determined it.
+ * Invert whether to scan the null partition as the source
+ * steps would've determined it. We need not set this flag
+ * if there's no NULL partition.
*/
- Assert(!source->scan_null ||
- partition_bound_accepts_nulls(boundinfo));
- result->scan_null = !source->scan_null;
+ result->scan_null = !source->scan_null &&
+ partition_bound_accepts_nulls(boundinfo);
/*
* Unlike other partitions, the set of values contained in
@@ -2089,14 +2020,9 @@ perform_pruning_combine_step(PartitionPruneContext *context,
* non-default partition defined, would still contain
* datums of the partition key's type that could only be
* in the default partition.
- *
- * XXX - the above reasoing only seems to apply if the
- * table is list partitioned. Maybe we should Assert that
- * it is. Currently, we generate a combine step with
- * the inversion op only for a case that's supported for
- * list partitioning.
*/
- result->scan_default = true;
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
}
break;
@@ -2131,7 +2057,7 @@ partkey_datum_from_expr(PartitionPruneContext *context,
}
/*
- * get_matching_hash_bound
+ * get_matching_hash_bounds
* Determine offset of the hash bound matching the specified value,
* considering that all the non-null values come from clauses containing
* a compatible hash eqaulity operator and any keys that are null come
@@ -2152,16 +2078,13 @@ partkey_datum_from_expr(PartitionPruneContext *context,
* hash for the type of the values contained in 'values'
* 'nullkeys' is the set of partition keys that are null.
- *
- * '*scan_all_nonnull' is set if all partitions containing non-null datums
- * should be scanned
*/
-static Bitmapset *
-get_matching_hash_bound(PartitionPruneContext *context,
- int opstrategy, Datum *values, int nvalues,
- FmgrInfo *partsupfunc, Bitmapset *nullkeys,
- bool *scan_all_nonnull)
+static PruneStepResult *
+get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
{
+ PruneStepResult *result;
PartitionBoundInfo boundinfo = context->boundinfo;
int *partindices = boundinfo->indexes;
int partnatts = context->partnatts;
@@ -2172,6 +2095,8 @@ get_matching_hash_bound(PartitionPruneContext *context,
Assert(context->strategy == PARTITION_STRATEGY_HASH);
+ result = palloc0(sizeof(PruneStepResult));
+
/*
* For hash partitioning we can only perform pruning based on equality
* clauses to the partition key or IS NULL clauses. We also can only
@@ -2179,7 +2104,6 @@ get_matching_hash_bound(PartitionPruneContext *context,
*/
if (nvalues + bms_num_members(nullkeys) == partnatts)
{
- *scan_all_nonnull = false;
/*
* If there are any values, they must have come from clauses
* containing an equality operator compatible with hash partitioning.
@@ -2193,12 +2117,13 @@ get_matching_hash_bound(PartitionPruneContext *context,
rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
if (partindices[rowHash % greatest_modulus] >= 0)
- return bms_make_singleton(rowHash % greatest_modulus);
+ result->bound_offsets = bms_make_singleton(rowHash % greatest_modulus);
}
else
- *scan_all_nonnull = true;
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
- return NULL;
+ return result;
}
/*
@@ -2216,22 +2141,14 @@ get_matching_hash_bound(PartitionPruneContext *context,
* to perform partition_list_bsearch
* 'nullkeys' is the set of partition keys that are null.
- *
- * '*scan_null' is set if the special null-accepting partition should be
- * scanned
- *
- * '*scan_default' is set if the special default partition should be scanned
- *
- * '*scan_all_nonnull' is set if all partitions containing non-null datums
- * should be scanned
*/
-static Bitmapset *
+static PruneStepResult *
get_matching_list_bounds(PartitionPruneContext *context,
int opstrategy, Datum value, int nvalues,
FmgrInfo *partsupfunc, Bitmapset *nullkeys,
- bool *scan_null, bool *scan_default,
- bool *scan_all_nonnull)
+ Bitmapset *notnullkeys)
{
+ PruneStepResult *result;
PartitionBoundInfo boundinfo = context->boundinfo;
int off,
minoff,
@@ -2243,7 +2160,7 @@ get_matching_list_bounds(PartitionPruneContext *context,
Assert(context->strategy == PARTITION_STRATEGY_LIST);
Assert(context->partnatts == 1);
- *scan_null = *scan_default = *scan_all_nonnull = false;
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
if (!bms_is_empty(nullkeys))
{
@@ -2253,10 +2170,21 @@ get_matching_list_bounds(PartitionPruneContext *context,
* the former doesn't exist.
*/
if (partition_bound_accepts_nulls(boundinfo))
- *scan_null = true;
- else if (partition_bound_has_default(boundinfo))
- *scan_default = true;
- return NULL;
+ result->scan_null = true;
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /*
+ * Handle IS NOT NULL clauses. Here we just include everything apart from
+ * the NULL partition.
+ */
+ if (!bms_is_empty(notnullkeys))
+ {
+ result->bound_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
}
/*
@@ -2265,9 +2193,8 @@ get_matching_list_bounds(PartitionPruneContext *context,
*/
if (boundinfo->ndatums == 0)
{
- if (partition_bound_has_default(boundinfo))
- *scan_default = true;
- return NULL;
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
}
minoff = 0;
@@ -2283,21 +2210,29 @@ get_matching_list_bounds(PartitionPruneContext *context,
*/
if (opstrategy != BTEqualStrategyNumber &&
partition_bound_has_default(boundinfo))
- *scan_default = true;
+ result->scan_default = true;
/*
- * If there are no values to compare with the datums in boundinfo, it
- * means the caller asked for partitions for all non-null datums. Add
- * indexes of *all* partitions.
+ * If there are no values to compare with the datums in boundinfo then
+ * handle this according to what the opstrategy is set to. Normally this
+ * will mean we must include all non-null datums, however, in the case
+ * that we're performing partition elimination for not-equal clauses, lack
+ * of any values means we want to return only the NULL partition. The
+ * result of this will be inverted to become all partitions apart from the
+ * NULL.
*/
if (nvalues == 0)
{
- *scan_all_nonnull = true;
- return NULL;
+ if (opstrategy != InvalidStrategy)
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ else
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
}
switch (opstrategy)
{
+ case InvalidStrategy: /* inverted not-equal case */
case BTEqualStrategyNumber:
off = partition_list_bsearch(partsupfunc,
partcollation,
@@ -2306,11 +2241,11 @@ get_matching_list_bounds(PartitionPruneContext *context,
if (off >= 0 && is_equal)
{
Assert(boundinfo->indexes[off] >= 0);
- return bms_make_singleton(off);
+ result->bound_offsets = bms_make_singleton(off);
+ return result;
}
- else if (partition_bound_has_default(boundinfo))
- *scan_default = true;
- return NULL;
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
case BTGreaterEqualStrategyNumber:
inclusive = true;
@@ -2342,7 +2277,7 @@ get_matching_list_bounds(PartitionPruneContext *context,
* above anyway if one exists.
*/
if (off > boundinfo->ndatums - 1)
- return NULL;
+ return result;
minoff = off;
break;
@@ -2365,7 +2300,7 @@ get_matching_list_bounds(PartitionPruneContext *context,
* above anyway if one exists.
*/
if (off < 0)
- return NULL;
+ return result;
maxoff = off;
break;
@@ -2375,7 +2310,8 @@ get_matching_list_bounds(PartitionPruneContext *context,
break;
}
- return bms_add_range(NULL, minoff, maxoff);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
}
/*
@@ -2402,12 +2338,12 @@ get_matching_list_bounds(PartitionPruneContext *context,
*
* 'nullkeys' is the set of partition keys that are null.
*/
-static Bitmapset *
+static PruneStepResult *
get_matching_range_bounds(PartitionPruneContext *context,
int opstrategy, Datum *values, int nvalues,
- FmgrInfo *partsupfunc, Bitmapset *nullkeys,
- bool *scan_default, bool *scan_all_nonnull)
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
{
+ PruneStepResult *result;
PartitionBoundInfo boundinfo = context->boundinfo;
Oid *partcollation = context->partcollation;
int partnatts = context->partnatts;
@@ -2422,7 +2358,7 @@ get_matching_range_bounds(PartitionPruneContext *context,
Assert(context->strategy == PARTITION_STRATEGY_RANGE);
Assert(nvalues <= partnatts);
- *scan_default = *scan_all_nonnull = false;
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
/*
* If there are no datums to compare keys with, or if we got an IS NULL
@@ -2430,9 +2366,8 @@ get_matching_range_bounds(PartitionPruneContext *context,
*/
if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
{
- if (partition_bound_has_default(boundinfo))
- *scan_default = true;
- return NULL;
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
}
minoff = 0;
@@ -2451,10 +2386,10 @@ get_matching_range_bounds(PartitionPruneContext *context,
if (partindices[maxoff] < 0)
maxoff--;
- if (partition_bound_has_default(boundinfo))
- *scan_default = true;
+ result->scan_default = partition_bound_has_default(boundinfo);
- return bms_add_range(NULL, minoff, maxoff);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
}
/*
@@ -2462,7 +2397,7 @@ get_matching_range_bounds(PartitionPruneContext *context,
* the default partition, if any.
*/
if (nvalues < partnatts && partition_bound_has_default(boundinfo))
- *scan_default = true;
+ result->scan_default = true;
switch (opstrategy)
{
@@ -2482,10 +2417,12 @@ get_matching_range_bounds(PartitionPruneContext *context,
{
/* There can only be zero or one matching partition. */
if (partindices[off + 1] >= 0)
- return bms_make_singleton(off + 1);
- else if (partition_bound_has_default(boundinfo))
- *scan_default = true;
- return NULL;
+ {
+ result->bound_offsets = bms_make_singleton(off + 1);
+ return result;
+ }
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
}
else
{
@@ -2591,14 +2528,15 @@ get_matching_range_bounds(PartitionPruneContext *context,
{
if (partindices[i] < 0)
{
- *scan_default = true;
+ result->scan_default = true;
break;
}
}
}
Assert(minoff >= 0 && maxoff >= 0);
- return bms_add_range(NULL, minoff, maxoff);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
}
else if (off >= 0) /* !is_equal */
{
@@ -2610,18 +2548,17 @@ get_matching_range_bounds(PartitionPruneContext *context,
* only partition that may contain the look-up value.
*/
if (partindices[off + 1] >= 0)
- return bms_make_singleton(off + 1);
- else if (partition_bound_has_default(boundinfo))
- *scan_default = true;
- return NULL;
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
}
/*
* off < 0, meaning the look-up value is smaller that all bounds,
* so only the default partition, if any, qualifies.
*/
- else if (partition_bound_has_default(boundinfo))
- *scan_default = true;
- return NULL;
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
case BTGreaterEqualStrategyNumber:
inclusive = true;
@@ -2719,9 +2656,8 @@ get_matching_range_bounds(PartitionPruneContext *context,
* All bounds are greater than the key, so we could only
* expect to find the look-up key in the default partition.
*/
- if (partition_bound_has_default(boundinfo))
- *scan_default = true;
- return NULL;
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
}
else
{
@@ -2794,7 +2730,7 @@ get_matching_range_bounds(PartitionPruneContext *context,
PARTITION_RANGE_DATUM_VALUE &&
partition_bound_has_default(boundinfo))
{
- *scan_default = true;
+ result->scan_default = true;
}
minoff++;
@@ -2815,7 +2751,7 @@ get_matching_range_bounds(PartitionPruneContext *context,
PARTITION_RANGE_DATUM_VALUE &&
partition_bound_has_default(boundinfo))
{
- *scan_default = true;
+ result->scan_default = true;
}
maxoff--;
@@ -2832,7 +2768,7 @@ get_matching_range_bounds(PartitionPruneContext *context,
{
if (partindices[i] < 0)
{
- *scan_default = true;
+ result->scan_default = true;
break;
}
}
@@ -2840,8 +2776,9 @@ get_matching_range_bounds(PartitionPruneContext *context,
Assert(minoff >= 0 && maxoff >= 0);
if (minoff > maxoff)
- return NULL;
- return bms_add_range(NULL, minoff, maxoff);
+ return result;
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
}
/*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f5634274a9..51f1baa42e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2141,10 +2141,12 @@ _copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
{
PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+ COPY_SCALAR_FIELD(step.step_id);
COPY_SCALAR_FIELD(opstrategy);
COPY_NODE_FIELD(exprs);
COPY_NODE_FIELD(cmpfns);
COPY_BITMAPSET_FIELD(nullkeys);
+ COPY_BITMAPSET_FIELD(notnullkeys);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 3f9e2585c7..de0b3def2c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1703,6 +1703,7 @@ _outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
WRITE_NODE_FIELD(exprs);
WRITE_NODE_FIELD(cmpfns);
WRITE_BITMAPSET_FIELD(nullkeys);
+ WRITE_BITMAPSET_FIELD(notnullkeys);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 8348933151..00eb43823f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1338,6 +1338,7 @@ _readPartitionPruneStepOp(void)
READ_NODE_FIELD(exprs);
READ_NODE_FIELD(cmpfns);
READ_BITMAPSET_FIELD(nullkeys);
+ READ_BITMAPSET_FIELD(notnullkeys);
READ_DONE();
}
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index e5e6d7530b..f793d7d266 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -119,8 +119,8 @@ static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context
List *step_exprs,
List *step_cmpfns);
static Node *generate_pruning_step_op(GeneratePruningStepsContext *context,
- int opstrategy,
- List *exprs, List *cmpfns, Bitmapset *nullkeys);
+ int opstrategy, List *exprs, List *cmpfns,
+ Bitmapset *nullkeys, Bitmapset *notnullkeys);
static Node *generate_pruning_step_combine(GeneratePruningStepsContext *context,
List *source_stepids,
PartitionPruneCombineOp combineOp);
@@ -434,7 +434,6 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
COMBINE_UNION));
continue;
}
-
/*
* Fall-through for a NOT clause, which is handled in
* match_clause_to_partition_key().
@@ -573,14 +572,21 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
PartitionPruneStep *step;
/*
- * Generate a opstep using what must be a btree = operator, that
- * is, the negator of <> originally appearing in the clause.
+ * Generate an opstep using what must be a btree = operator, that
+ * is, the negator of <> originally appearing in the clause. We
+ * pass InvalidStrategy as the opstrategy rather than
+ * BTEqualStrategyNumber to signal to the step execution code to
+ * handle the case when no values are available for pruning.
+ * Normally when no values are available we'd return all non-null
+ * partitions, but in this case we want to return NULL, so that
+ * when the result is inverted it becomes all non-null partitions.
*/
step = (PartitionPruneStep *)
generate_pruning_step_op(context,
- BTEqualStrategyNumber,
+ InvalidStrategy,
list_make1(pc->expr),
list_make1_oid(pc->cmpfn),
+ NULL,
NULL);
step_ids = lappend_int(step_ids, step->step_id);
}
@@ -594,8 +600,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
*/
Assert(part_scheme->partnatts == 1);
nullpartStep = (PartitionPruneStep *)
- generate_pruning_step_op(context, 0, NIL, NIL,
- bms_make_singleton(0));
+ generate_pruning_step_op(context, InvalidStrategy, NIL, NIL,
+ bms_make_singleton(0), NULL);
step_ids = lappend_int(step_ids, nullpartStep->step_id);
/* Combine all opsteps above using a UNION combine step first. */
@@ -626,8 +632,9 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
(part_scheme->strategy != PARTITION_STRATEGY_HASH ||
bms_num_members(nullkeys) == part_scheme->partnatts))
result = lappend(result,
- generate_pruning_step_op(context, 0, NIL, NIL,
- nullkeys));
+ generate_pruning_step_op(context,
+ InvalidStrategy, NIL,
+ NIL, nullkeys, NULL));
/*
* Note that for IS NOT NULL clauses, simply having step suffices;
@@ -638,8 +645,9 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
if (!bms_is_empty(notnullkeys) &&
part_scheme->strategy != PARTITION_STRATEGY_HASH)
result = lappend(result,
- generate_pruning_step_op(context, 0, NIL, NIL,
- NULL));
+ generate_pruning_step_op(context,
+ InvalidStrategy, NIL,
+ NIL, NULL, notnullkeys));
}
else
{
@@ -1535,7 +1543,7 @@ get_steps_using_prefix(GeneratePruningStepsContext *context,
return list_make1(generate_pruning_step_op(context, step_opstrategy,
list_make1(step_lastexpr),
list_make1_oid(step_lastcmpfn),
- step_nullkeys));
+ step_nullkeys, NULL));
/* Recurse to generate steps for various combinations. */
return get_steps_using_prefix_recurse(context,
@@ -1669,7 +1677,8 @@ get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
step_opstrategy,
step_exprs1,
step_cmpfns1,
- step_nullkeys));
+ step_nullkeys,
+ NULL));
}
}
@@ -1685,7 +1694,8 @@ get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
static Node *
generate_pruning_step_op(GeneratePruningStepsContext *context,
int opstrategy, List *exprs, List *cmpfns,
- Bitmapset *nullkeys)
+ Bitmapset *nullkeys,
+ Bitmapset *notnullkeys)
{
PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
@@ -1694,6 +1704,7 @@ generate_pruning_step_op(GeneratePruningStepsContext *context,
opstep->exprs = exprs;
opstep->cmpfns = cmpfns;
opstep->nullkeys = nullkeys;
+ opstep->notnullkeys = notnullkeys;
context->steps = lappend(context->steps, opstep);
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index a71d729e72..6f28cd4d56 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1546,6 +1546,9 @@ typedef struct PartitionPruneStep
* to the hash partition bound search function. It is never possible to
* have an expression be present in 'exprs' for a given partition key and
* the corresponding bit set in 'nullkeys'.
+ *
+ * 'notnullkeys' is similar to 'nullkeys' but this can be set not just for
+ * IS NOT NULL clauses, but also for strict clauses in general.
*----------
*/
typedef struct PartitionPruneStepOp
@@ -1556,6 +1559,7 @@ typedef struct PartitionPruneStepOp
List *exprs;
List *cmpfns;
Bitmapset *nullkeys;
+ Bitmapset *notnullkeys;
} PartitionPruneStepOp;
/*----------
On 4 April 2018 at 13:13, David Rowley <david.rowley@2ndquadrant.com> wrote:
There might be another issue with the patch too, but I'll send a
separate email about that.
In the current version of the patch the following comment exists:
/*
* Fall-through for a NOT clause, which is handled in
* match_clause_to_partition_key().
*/
The only real handling of NOT clauses is in
match_boolean_partition_clause() which just handles NOT(true) or
NOT(false).
It's true that the const simplification code will generally rewrite
most NOT(clause) to use the negator operator, but if the operator does
not have a negator it can't do this.
We probably don't have any built-in operators which are members of a
btree opclass which have no negator, but it's simple enough to modify
the citext extension by commenting out the NEGATOR lines in
citext--1.4--1.5.sql.
create extension citext;
create table listp(a citext) partition by list(a citext_pattern_ops);
create table listp_1 partition of listp for values in('1');
explain select * from listp where not (a ~>~ '0' and a ~<~ '2');
QUERY PLAN
--------------------------------------------------------------------------
Append (cost=0.00..36.45 rows=1209 width=32)
-> Seq Scan on listp_1 (cost=0.00..30.40 rows=1209 width=32)
Filter: ((NOT (a ~>~ '0'::citext)) OR (NOT (a ~<~ '2'::citext)))
(3 rows)
At the moment pruning does not work for this case at all. Perhaps it should?
I imagine it might be possible to re-work the COMBINE_INVERT code so
that it becomes a flag of the combine step rather than a step operator
type. It should then be possible to invert COMBINE_UNION for
NOT(clause1 OR clause2) and COMBINE_INTERSECT on NOT(clause1 AND
clause2).
IOW, it might not take too many lines of code to put this right.
Probably the bulk of the work would be writing a test with a btree
opclass that will allow us to have the planner not invert the clause
during const folding.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
It's true that the const simplification code will generally rewrite
most NOT(clause) to use the negator operator, but if the operator does
not have a negator it can't do this.
...
At the moment pruning does not work for this case at all. Perhaps it should?
It's hard to see why we'd expend extra effort to optimize such situations.
The right answer would invariably be to fix the inadequate operator
definition, because missing the negator link would hobble many other
cases besides this.
Now if you can show a case where the extra smarts would be useful
without presuming a badly-written opclass, it's a different matter.
regards, tom lane
On 4 April 2018 at 16:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
It's true that the const simplification code will generally rewrite
most NOT(clause) to use the negator operator, but if the operator does
not have a negator it can't do this.
...
At the moment pruning does not work for this case at all. Perhaps it should?It's hard to see why we'd expend extra effort to optimize such situations.
The right answer would invariably be to fix the inadequate operator
definition, because missing the negator link would hobble many other
cases besides this.Now if you can show a case where the extra smarts would be useful
without presuming a badly-written opclass, it's a different matter.
Okay, well that certainly sounds like less work.
In that case, the comment which claims we handle the NOT clauses needs
to be updated to mention that we only handle boolean NOT clauses and
don't optimize the remainder.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi David.
On 2018/04/04 10:13, David Rowley wrote:
On 4 April 2018 at 11:22, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 4 April 2018 at 09:47, David Rowley <david.rowley@2ndquadrant.com> wrote:
I think it would be better to just have special handling in
get_matching_list_bound so that it knows it's performing <>
elimination. I'd thought about passing some other opstrategy but the
only safe one I thought to use was InvalidStrategy, which is already
used by NULL handling.
Thanks for this suggestion.
Having the special case handling for steps corresponding to <> operator
clauses in get_matching_list_bounds() seems like the best way and that
should have been the way all along. It occurred to me that after I
changed the patch to store datum offsets in the result, there wasn't any
need for special handling of <> operators at a higher level -- like the
special pruning function (get_partitions_excluded_by_ne_datums) that we
used to have or the COMBINE_INVERT I recently proposed.
For each datum coming from a <> operator clause (signaled to
get_matching_list_bounds by passing InvalidStrategy for opstrategy), we
return all datums minus the one that was passed (if the latter is indeed
found in boundinfo). Bounds for individual <> operator clauses will be
combined using INTERSECT at a higher level to give the desired result. No
need for the invert step and for the planner to set things up very
carefully for invert step to do the right thing.
create table lp (a int) partition by list (a);
create table lp1 partition of lp for values in (1);
create table lp2 partition of lp for values in (2);
create table lp3 partition of lp for values in (3);
create table lp_null partition of lp for values in (null);
create table lp_default partition of lp default;
For
explain select * from lp where a <> 1
get_matching_list_bounds will returns the set of offsets of {2, 3} and
will set scan_default, while setting scan_null false;
and for
explain select * from lp where a <> 1 and a <> 3
it will returns the set of offsets of {2, 3} and {1, 2} for the individual
base steps and along setting scan_default and setting scan_null to false;
the INTERSECT combination step still combine those to give the offset of 2
with scan_default set to true and scan_null set to false.
QUERY PLAN
--------------------------------------------------------------------
Append (cost=0.00..121.75 rows=5050 width=4)
-> Seq Scan on lp2 (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 3))
-> Seq Scan on lp_default (cost=0.00..48.25 rows=2525 width=4)
Filter: ((a <> 1) AND (a <> 3))
If there are no values then the offsets of all the bounds will be returned
and unlike the previous setup with the INVERT step in the mix, they will
all survive.
I know we've been back and forth quite a bit on this, but this solution
seems like the one with the least amount of hackery. Hope you find it to
be the same way.
I'm currently working up a patch to do this the way I think is best.
I'll submit it soon and we can review and get your thoughts on it.
I've attached a rough cut version of what I think is a good solution
for this. It's based on v46, not your latest v47, sorry.This makes get_matching_list_bounds() aware that it's performing the
not-equal pruning via the opstrategy which allows it to not return all
partitions when there are no values in this case. Instead, we return
the NULL partition, so that we later invert that and return everything
apart from the NULL partition. A strict clause will allow us that
much, even if we can't get the actual value being compared to, at the
time.
As I explained above, I considered your general idea of teaching
get_matching_list_bounds to deal with being passed InvalidStrategy for
opstrategy to signal special handling of a <> clause datum.
By implementing that, I was able to get rid of a bunch of code in
partprune.c and remove the COMBINE_INVERT related code. We can add
COMBINE_INVERT later if and when we need it (for some legitimate purpose).
There's also a bunch of other changes in there:
Thanks.
1. Adding missing step_id in copyfuncs.c
Merged.
2. Simplified including the default partition in a bunch of cases.
3. Made it so scan_default and scan_null are only ever set to true if
there's a partition for that.
I have merged these too.
4. Changed get_matching_*_bounds to return the entire result struct
instead of the Bitmapset and pass the remaining bool values back
through params. I didn't really like how you'd change this to pass all
the bool flags back as params. There's a perfectly good struct there
to provide the entire result in a single return value. I know you've
disagreed with this already, so would be nice to get a 3rd opinion.
I went ahead with them returning PruneStepResult struct.
5. Rename get_matching_hash_bound to get_matching_hash_bounds. The
LIST and RANGE version of this function both had a plural name. I
didn't see any reason for the hash case to be different.
Agreed, merged.
Let me know what you think.
I'm not sure about the following change in your patch:
- if (!result->scan_null)
- result->scan_null = step_result->scan_null;
- if (!result->scan_default)
- result->scan_default = step_result->scan_default;
+ result->scan_null |= step_result->scan_null;
+ result->scan_default |= step_result->scan_default;
Afaik, |= does bitwise OR, which even if it might give the result we want,
is not a logical operation. I had written the original code using the
following definition of logical OR.
a OR b = if a then true else b
Also, since things work normally even if we pass no values to
get_matching_list_bounds, including via the dummy step generated for IS
NOT NULL clause(s), I don't see the need to store notnullkeys in the prune
step. Especially, it's redundant to set notnullkeys in the pruning step
containing non-empty exprs since, by definition, they will select
partitions containing non-null datums.
I've patched the run-time pruning v18 against this and it now passes regression.
I need to do a bit more testing on this to ensure it works for all
cases, but thought I'd send now as I suspect you're currently around
to look.
See if attached works for you.
There might be another issue with the patch too, but I'll send a
separate email about that.
I suppose this is the email about support for pruning using NOT clauses in
partition.c. It might be possible to do that by tweaking things somehow
by re-introducing the COMBINE_INVERT step (legitimately needed in that
case) and modifying partprune.c to capture NOT clauses in more cases than
it does currently.
Although, I modified the comment like you suggested that we only support
Boolean NOT clause in special cases like when using Boolean partitioning
opfamily.
Attached v48.
Thanks,
Amit
Attachments:
v48-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v48-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 7c42ded6afcc7ce3c46666fab7b86ced615c2746 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v48 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8a6baa7bea..b46b33d4f7 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1881,7 +1881,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1899,7 +1900,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1917,6 +1918,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1951,6 +1965,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a2dde70de5..83b03b41e4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v48-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v48-0002-Add-more-tests-for-partition-pruning.patchDownload
From 31b6fa9e425735b0537a27793cb6c3bcf4a3224e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v48 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 258 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 88 ++++++++-
2 files changed, 344 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..a0edba291f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,260 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..d2b4561530 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,90 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v48-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v48-0003-Faster-partition-pruning.patchDownload
From f3fcac342ca3d6a216aa6b3dd4f4755b61da04c0 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v48 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1115 ++++++++++++++++
src/backend/nodes/copyfuncs.c | 37 +
src/backend/nodes/nodeFuncs.c | 25 +
src/backend/nodes/outfuncs.c | 28 +
src/backend/nodes/readfuncs.c | 30 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1678 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 3 +
src/include/nodes/primnodes.h | 73 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 282 ++++-
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 7 +
20 files changed, 3385 insertions(+), 73 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..df45ce3b43 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,23 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * The following struct describes the result of performing one
+ * PartitionPruneStep.
+ */
+typedef struct PruneStepResult
+{
+ /*
+ * This contains the offsets of the bounds in a table's boundinfo, each of
+ * which is a bound whose corresponding partition is selected by a a given
+ * pruning step.
+ */
+ Bitmapset *bound_offsets;
+
+ /* Set if we need to scan the default and/or the null partition, resp. */
+ bool scan_default;
+ bool scan_null;
+} PruneStepResult;
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -197,6 +214,23 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static PruneStepResult *get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1620,9 +1654,1090 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **step_results,
+ *final_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. The
+ * result of applying all pruning steps is the value contained in the slot
+ * of the last pruning step.
+ */
+ step_results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ step_results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ step_results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ step_results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know the offsets of all the datums whose corresponding
+ * partitions need to be in the result, including special null-accepting
+ * and default partitions. Collect the actual partition indexes now.
+ */
+ final_result = step_results[num_steps - 1];
+ Assert(final_result != NULL);
+ i = -1;
+ result = NULL;
+ while ((i = bms_next_member(final_result->bound_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed and if present. */
+ if (final_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ if (partition_bound_accepts_nulls(context->boundinfo))
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (final_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ if (partition_bound_has_default(context->boundinfo))
+ result = bms_add_member(result,
+ context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /*
+ * There better be the same number of expressions and compare functions.
+ */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * the get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ fmgr_info_copy(&partsupfunc[keyno],
+ &context->partsupfunc[keyno],
+ CurrentMemoryContext);
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_matching_hash_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ break;
+ }
+
+ return NULL;
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids using the specified
+ * combination method
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+
+ /*
+ * In some cases, the planner generates a combine step that doesn't
+ * contain any argument steps, to signal us to not prune any partitions.
+ * So, return indexes of all datums in that case, including null and/or
+ * default partition, if any.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (list_length(cstep->source_stepids) == 0)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->bound_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+ else
+ {
+ bool firststep;
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result,
+ * which is confirmed by the fact that cstep's step_id is
+ * greater than step_id and the fact that results of the
+ * individual steps are evaluated in sequence of their
+ * step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ /* Record any additional datum indexes from this step */
+ result->bound_offsets =
+ bms_add_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
+
+ case COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->bound_offsets = step_result->bound_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /* Record datum indexes common to both steps */
+ result->bound_offsets =
+ bms_int_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /*
+ * Update whether to scan null and default partitions.
+ */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default =
+ step_result->scan_default;
+ }
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_matching_hash_bounds
+ * Determine offset of the hash bound matching the specified value,
+ * considering that all the non-null values come from clauses containing
+ * a compatible hash eqaulity operator and any keys that are null come
+ * from an IS NULL clause
+ *
+ * In most cases, the result would contain just one bound's offset, although
+ * the set may be empty if the corresponding hash partition has not been
+ * created. Also, if insufficient number of values were provided, all bounds
+ * are returned.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we got values for all keys.
+ */
+ if (nvalues + bms_num_members(nullkeys) == partnatts)
+ {
+ /*
+ * If there are any values, they must have come from clauses
+ * containing an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ result->bound_offsets =
+ bms_make_singleton(rowHash % greatest_modulus);
+ }
+ else
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+
+ /*
+ * There is neither a special hash null partition or the default hash
+ * partition.
+ */
+ result->scan_null = result->scan_default = false;
+
+ return result;
+}
+
+/*
+ * get_matching_list_bounds
+ * Determine the offsets of list bounds matching the specified value,
+ * according to the semantics of the given operator strategy
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
+
+ * 'value' contains the value to use for pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the list partitioning comparison function to be used
+ * to perform partition_list_bsearch
+ */
+static PruneStepResult *
+get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ result->scan_null = result->scan_default = false;
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ result->scan_null = true;
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default if any.
+ */
+ if (nvalues == 0)
+ {
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /* Speical case handling of values coming from a <> operator clause. */
+ if (opstrategy == InvalidStrategy)
+ {
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);;
+
+ off = partition_list_bsearch(partsupfunc, partcollation, boundinfo,
+ value, &is_equal);
+ if (off >= 0 && is_equal)
+ {
+
+ /* All bounds except this one qualify. */
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_del_member(result->bound_offsets,
+ off);
+ }
+
+ /* Always include the default partition if any. */
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ return result;
+ }
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_make_singleton(off);
+ }
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partitions satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
+ * get_matching_range_datums
+ * Determine the offsets of range bounds matching the specified values,
+ * according to the semantics of the given operator strategy
+ *
+ * Each datum whose offset is in result is to be treated as the upper bound of
+ * the partition that will contain the desired values.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'nvalues', if non-zero, should be <= context->partnatts - 1
+
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the range partitioning comparison functions to be
+ * used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * using.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(nvalues <= partnatts);
+
+ result->scan_null = result->scan_default = false;
+
+ /*
+ * If there are no datums to compare keys with, or if we got an IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+ }
+
+ /*
+ * If the query does not constrain all key columns, we'll need to scan the
+ * the default partition, if any.
+ */
+ if (nvalues < partnatts)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ /*
+ * Look for the smallest bound that is = look-up value.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partition. */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ */
+
+ /*
+ * First find greatest bound that's smaller than the
+ * look-up value.
+ */
+ while (off >= 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+ /*
+ * We can treat off as the offset of the smallest bound to
+ * be included in the result, if we know it is the upper
+ * bound of the partition in which the look-up value could
+ * possibly exist. One case it couldn't is if the bound,
+ * or precisely the matched portion of its prefix, is not
+ * inclusive.
+ */
+ if (boundinfo->kind[off][nvalues] ==
+ PARTITION_RANGE_DATUM_MINVALUE)
+ off++;
+
+ minoff = off;
+
+ /*
+ * Now find smallest bound that's greater than the look-up
+ * value.
+ */
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * off + 1, then would be the offset of the greatest bound
+ * to be included in the result.
+ */
+ maxoff = off + 1;
+ }
+
+ /*
+ * Skip if minoff/maxoff are actually the upper bound of a
+ * un-assigned portion of values.
+ */
+ if (partindices[minoff] < 0 && minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+ else if (off >= 0) /* !is_equal */
+ {
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * only partition that may contain the look-up value.
+ */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ return result;
+ }
+ /*
+ * off < 0, meaning the look-up value is smaller that all bounds,
+ * so only the default partition, if any, qualifies.
+ */
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ /*
+ * Look for the smallest bound that is > or >= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the look-up value, so include
+ * all of them in the result.
+ */
+ minoff = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ *
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes of all such
+ * bounds in the result (that is, set minoff to the index
+ * of smallest such bound) or find the smallest one that's
+ * greater than the look-up value and set minoff to that.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ minoff = inclusive ? off : off + 1;
+ }
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * smallest partition that may contain the look-up value.
+ */
+ else
+ minoff = off + 1;
+ }
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ /*
+ * Look for the greatest bound that is < or <= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the key, so we could only
+ * expect to find the look-up key in the default partition.
+ */
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ /*
+ * See the comment above.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ maxoff = inclusive ? off + 1: off;
+ }
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * greatest partition that may contain look-up value. If
+ * the look-up value had exactly matched the bound, but it
+ * isn't inclusive, no need add the adjacent partition.
+ */
+ else if (!is_equal || inclusive)
+ maxoff = off + 1;
+ else
+ maxoff = off;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition, as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (minoff > maxoff)
+ return result;
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c3efca3c45..450c64d6fc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2136,6 +2136,37 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5059,6 +5090,12 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 4157e7eb9a..c3f1789ce2 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2156,6 +2156,17 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2958,6 +2969,20 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c8d962670e..efd0a71a2c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1710,6 +1710,28 @@ _outFromExpr(StringInfo str, const FromExpr *node)
}
static void
+_outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPOP");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_INT_FIELD(opstrategy);
+ WRITE_NODE_FIELD(exprs);
+ WRITE_NODE_FIELD(cmpfns);
+ WRITE_BITMAPSET_FIELD(nullkeys);
+}
+
+static void
+_outPartitionPruneStepCombine(StringInfo str, const PartitionPruneStepCombine *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPCOMBINE");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ WRITE_NODE_FIELD(source_stepids);
+}
+
+static void
_outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
{
WRITE_NODE_TYPE("ONCONFLICTEXPR");
@@ -3958,6 +3980,12 @@ outNode(StringInfo str, const void *obj)
case T_OnConflictExpr:
_outOnConflictExpr(str, obj);
break;
+ case T_PartitionPruneStepOp:
+ _outPartitionPruneStepOp(str, obj);
+ break;
+ case T_PartitionPruneStepCombine:
+ _outPartitionPruneStepCombine(str, obj);
+ break;
case T_Path:
_outPath(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4518fa0cdb..25874074a0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1331,6 +1331,32 @@ _readOnConflictExpr(void)
READ_DONE();
}
+static PartitionPruneStepOp *
+_readPartitionPruneStepOp(void)
+{
+ READ_LOCALS(PartitionPruneStepOp);
+
+ READ_INT_FIELD(step.step_id);
+ READ_INT_FIELD(opstrategy);
+ READ_NODE_FIELD(exprs);
+ READ_NODE_FIELD(cmpfns);
+ READ_BITMAPSET_FIELD(nullkeys);
+
+ READ_DONE();
+}
+
+static PartitionPruneStepCombine *
+_readPartitionPruneStepCombine(void)
+{
+ READ_LOCALS(PartitionPruneStepCombine);
+
+ READ_INT_FIELD(step.step_id);
+ READ_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ READ_NODE_FIELD(source_stepids);
+
+ READ_DONE();
+}
+
/*
* Stuff from parsenodes.h.
*/
@@ -2596,6 +2622,10 @@ parseNodeString(void)
return_value = _readFromExpr();
else if (MATCH("ONCONFLICTEXPR", 14))
return_value = _readOnConflictExpr();
+ else if (MATCH("PARTITIONPRUNESTEPOP", 20))
+ return_value = _readPartitionPruneStepOp();
+ else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
+ return_value = _readPartitionPruneStepCombine();
else if (MATCH("RTE", 3))
return_value = _readRangeTblEntry();
else if (MATCH("RANGETBLFUNCTION", 16))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c4e4db15a6..fd89c7cfee 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -874,6 +875,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -881,6 +884,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1128,6 +1145,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..ab6390234b
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1678 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to partition pruning "steps"
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning", as all the needed
+ * information is statically available in the query being planned. Otherwise,
+ * they'd need to be delivered to the executor where the missing information
+ * can be filled and pruning tried one more time, which would be called
+ * "dynamic pruning".
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ bool isopne; /* is clause's original operator <> ? */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ bool static_pruning;
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_isopne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_isopne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static PartitionPruneStep *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool isopne,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static PartitionPruneStep *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses, true,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If static_pruning is true, include in the result only steps that contain at
+ * least one Const. If any of the clause in the input list is a
+ * pseudo-constant "false", *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool static_pruning, bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.static_pruning = static_pruning;
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the global array context->steps and
+ * each one gets an identifier which is unique across all recursive
+ * invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. The caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS];
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * indepdently, collect their step IDs to be stored in the combine
+ * step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps either means that arg_constfalse is true
+ * or the arg didn't contain a clause matching this
+ * partition key.
+ *
+ * In case of the latter, we cannot prune using such
+ * an arg. To indicate that to the pruning code, we
+ * must construct a dummy PartitionPruneStepCombine
+ * whose source_stepids is set to to an empty List.
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ /* Just ignore this argument. */
+ if (arg_constfalse)
+ continue;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_UNION);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ if (arg_stepids != NIL)
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ if (arg_stepids)
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_INTERSECT));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which if it's a Boolean clause
+ * clause, will be handled in match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ Assert(pc != NULL);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * generate_opsteps set to false means no OpExprs were directly present in
+ * the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ {
+ PartitionPruneStep *step;
+
+ step = generate_pruning_step_op(context, 0, false, NIL, NIL,
+ nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ {
+ PartitionPruneStep *step;
+
+ step = generate_pruning_step_op(context, 0, false,
+ NIL, NIL, NULL);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an AND combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ if (step_ids != NIL)
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_INTERSECT));
+ }
+
+ return result;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+ bool is_opne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_opne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_opne_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, BTORDER_PROC);
+ break;
+
+ case PARTITION_STRATEGY_HASH:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ exprtype, exprtype, HASHEXTENDED_PROC);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ (*pc)->isopne = false;
+ if (is_opne_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ (*pc)->isopne = true;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->isopne,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys, which
+ * get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ false,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ if (opstep_ids != NIL)
+ return generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_INTERSECT);
+ return NULL;
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_isopne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStep *step;
+
+ step = generate_pruning_step_op(context,
+ step_opstrategy, step_isopne,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys);
+ return step != NULL ? list_make1(step) : NIL;
+ }
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_isopne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_isopne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_isopne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+ PartitionPruneStep *step;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ step = generate_pruning_step_op(context,
+ step_opstrategy, step_isopne,
+ step_exprs1, step_cmpfns1,
+ step_nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Following functions generate pruning steps of various types. Each step
+ * that's created is added to a global context->steps and receive a globally
+ * unique identifier that's sourced from context->next_step_id.
+ */
+
+static PartitionPruneStep *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool isopne,
+ List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+
+ if (!isopne)
+ opstep->opstrategy = opstrategy;
+ else
+ {
+ Assert(opstrategy == BTEqualStrategyNumber);
+ opstep->opstrategy = InvalidStrategy;
+ }
+ Assert(list_length(exprs) == list_length(cmpfns));
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (PartitionPruneStep *) opstep;
+}
+
+static PartitionPruneStep *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (PartitionPruneStep *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b46b33d4f7..32e973385d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1869,6 +1878,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..8981901272 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -74,4 +96,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fce48026b6..4df979e9eb 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -193,6 +193,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..c9d2187631 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,77 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_UNION,
+ COMBINE_INTERSECT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 83b03b41e4..9b9aabddef 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..fb2f4b80fc
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool static_pruning, bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index a0edba291f..0be31cce7e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -1007,24 +1009,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1032,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1111,13 +1098,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(28 rows)
-- pruning should work fine, because values for a prefix of keys (a, b) are
-- available
@@ -1275,22 +1270,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning, with values provided for both keys
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1343,3 +1332,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d2b4561530..8377671cde 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -239,3 +239,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6d8a44cd9e..aa2ec281c4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1587,6 +1588,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1599,6 +1601,10 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
@@ -1752,6 +1758,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
--
2.11.0
v48-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v48-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 15022f4fbeb73557eddcd74b6429ef385b3db2f8 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v48 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 99 ++++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 111 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 450c64d6fc..b0fa556f71 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2295,21 +2295,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5115,9 +5100,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 45ceba2830..28eecbbf08 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3214,9 +3204,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index efd0a71a2c..e6793b4716 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2271,7 +2271,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2296,6 +2295,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2345,6 +2345,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2570,16 +2571,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4127,9 +4118,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd89c7cfee..c36a254ed6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -884,6 +884,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1337,6 +1348,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1347,7 +1364,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1374,49 +1390,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1435,9 +1457,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 15c8d34c70..008492bad5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -616,7 +616,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -631,6 +630,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1191,12 +1191,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1268,10 +1268,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1503,6 +1505,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1603,6 +1614,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1610,7 +1636,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
0,
subpaths,
@@ -6145,65 +6171,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 5236ab378e..67e47887fc 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4df979e9eb..1ec8030d4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -265,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9b9aabddef..afe1faf2ea 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2130,27 +2135,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 07a3bc0627..c090396e13 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,9 +59,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index aa2ec281c4..adde8eaee9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1611,7 +1611,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 2018/04/04 14:42, Amit Langote wrote:
Attached v48.
I had forgotten to remove the static_pruning parameter I had added in the
v47, because it is no longer used. Static pruning now occurs even if a
step contains all Params, in which case each of
get_matching_hash/list/range_bounds() functions returns offsets of all
non-null datums, because the Params cannot be resolved to actual values
during static pruning.
Also, a few changes to get_matching_partitions that David had proposed in
his delta patch but I had failed to include them in v48.
Attached v49.
Thanks,
Amit
Attachments:
v49-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v49-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 7c42ded6afcc7ce3c46666fab7b86ced615c2746 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v49 1/4] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8a6baa7bea..b46b33d4f7 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1881,7 +1881,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1899,7 +1900,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1917,6 +1918,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1951,6 +1965,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a2dde70de5..83b03b41e4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v49-0002-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v49-0002-Add-more-tests-for-partition-pruning.patchDownload
From 31b6fa9e425735b0537a27793cb6c3bcf4a3224e Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v49 2/4] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 258 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 88 ++++++++-
2 files changed, 344 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..a0edba291f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,260 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..d2b4561530 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,90 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v49-0003-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v49-0003-Faster-partition-pruning.patchDownload
From e395b2fa61693da45ced7493c3c572021f90b165 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v49 3/4] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1114 ++++++++++++++++
src/backend/nodes/copyfuncs.c | 37 +
src/backend/nodes/nodeFuncs.c | 25 +
src/backend/nodes/outfuncs.c | 28 +
src/backend/nodes/readfuncs.c | 30 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1672 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 44 +-
src/backend/optimizer/util/relnode.c | 8 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 3 +
src/include/nodes/primnodes.h | 73 ++
src/include/nodes/relation.h | 4 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 282 ++++-
src/test/regress/sql/partition_prune.sql | 39 +-
src/tools/pgindent/typedefs.list | 7 +
20 files changed, 3378 insertions(+), 73 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..e1ffc5271f 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -138,6 +138,23 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * The following struct describes the result of performing one
+ * PartitionPruneStep.
+ */
+typedef struct PruneStepResult
+{
+ /*
+ * This contains the offsets of the bounds in a table's boundinfo, each of
+ * which is a bound whose corresponding partition is selected by a a given
+ * pruning step.
+ */
+ Bitmapset *bound_offsets;
+
+ /* Set if we need to scan the default and/or the null partition, resp. */
+ bool scan_default;
+ bool scan_null;
+} PruneStepResult;
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -197,6 +214,23 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static PruneStepResult *get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1620,9 +1654,1089 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **results,
+ *final_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. The
+ * result of applying all pruning steps is the value contained in the slot
+ * of the last pruning step.
+ */
+ results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know the offsets of all the datums whose corresponding
+ * partitions need to be in the result, including special null-accepting
+ * and default partitions. Collect the actual partition indexes now.
+ */
+ final_result = results[num_steps - 1];
+ Assert(final_result != NULL);
+ i = -1;
+ result = NULL;
+ while ((i = bms_next_member(final_result->bound_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed and if present. */
+ if (final_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partition_bound_accepts_nulls(context->boundinfo));
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (final_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(partition_bound_has_default(context->boundinfo));
+ result = bms_add_member(result, context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /*
+ * There better be the same number of expressions and compare functions.
+ */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * the get_partitions_from_keys_* functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ fmgr_info_copy(&partsupfunc[keyno],
+ &context->partsupfunc[keyno],
+ CurrentMemoryContext);
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_matching_hash_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ break;
+ }
+
+ return NULL;
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids using the specified
+ * combination method
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+
+ /*
+ * In some cases, the planner generates a combine step that doesn't
+ * contain any argument steps, to signal us to not prune any partitions.
+ * So, return indexes of all datums in that case, including null and/or
+ * default partition, if any.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (list_length(cstep->source_stepids) == 0)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->bound_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+ else
+ {
+ bool firststep;
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result,
+ * which is confirmed by the fact that cstep's step_id is
+ * greater than step_id and the fact that results of the
+ * individual steps are evaluated in sequence of their
+ * step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ /* Record any additional datum indexes from this step */
+ result->bound_offsets =
+ bms_add_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
+
+ case COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->bound_offsets = step_result->bound_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /* Record datum indexes common to both steps */
+ result->bound_offsets =
+ bms_int_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /*
+ * Update whether to scan null and default partitions.
+ */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default =
+ step_result->scan_default;
+ }
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Set *value to the constant value if 'expr' provides one
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_matching_hash_bounds
+ * Determine offset of the hash bound matching the specified value,
+ * considering that all the non-null values come from clauses containing
+ * a compatible hash eqaulity operator and any keys that are null come
+ * from an IS NULL clause
+ *
+ * In most cases, the result would contain just one bound's offset, although
+ * the set may be empty if the corresponding hash partition has not been
+ * created. Also, if insufficient number of values were provided, all bounds
+ * are returned.
+ *
+ * 'nvalues', if non-zero, denotes the number of values contained in 'values'
+
+ * 'values' contains values to be used for pruning appearing in the array in
+ * respective partition key position.
+
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'
+
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we got values for all keys.
+ */
+ if (nvalues + bms_num_members(nullkeys) == partnatts)
+ {
+ /*
+ * If there are any values, they must have come from clauses
+ * containing an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ result->bound_offsets =
+ bms_make_singleton(rowHash % greatest_modulus);
+ }
+ else
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+
+ /*
+ * There is neither a special hash null partition or the default hash
+ * partition.
+ */
+ result->scan_null = result->scan_default = false;
+
+ return result;
+}
+
+/*
+ * get_matching_list_bounds
+ * Determine the offsets of list bounds matching the specified value,
+ * according to the semantics of the given operator strategy
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
+
+ * 'value' contains the value to use for pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the list partitioning comparison function to be used
+ * to perform partition_list_bsearch
+ */
+static PruneStepResult *
+get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ result->scan_null = result->scan_default = false;
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ result->scan_null = true;
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default if any.
+ */
+ if (nvalues == 0)
+ {
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /* Speical case handling of values coming from a <> operator clause. */
+ if (opstrategy == InvalidStrategy)
+ {
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);;
+
+ off = partition_list_bsearch(partsupfunc, partcollation, boundinfo,
+ value, &is_equal);
+ if (off >= 0 && is_equal)
+ {
+
+ /* All bounds except this one qualify. */
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_del_member(result->bound_offsets,
+ off);
+ }
+
+ /* Always include the default partition if any. */
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ return result;
+ }
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_make_singleton(off);
+ }
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partitions satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
+ * get_matching_range_datums
+ * Determine the offsets of range bounds matching the specified values,
+ * according to the semantics of the given operator strategy
+ *
+ * Each datum whose offset is in result is to be treated as the upper bound of
+ * the partition that will contain the desired values.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'nvalues', if non-zero, should be <= context->partnatts - 1
+
+ * 'values' contains values for partition keys (or a prefix) to be used for
+ * pruning
+
+ * 'opstrategy' if non-zero must be a btree strategy number
+
+ * 'partsupfunc' contains the range partitioning comparison functions to be
+ * used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * using.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(nvalues <= partnatts);
+
+ result->scan_null = result->scan_default = false;
+
+ /*
+ * If there are no datums to compare keys with, or if we got an IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+ }
+
+ /*
+ * If the query does not constrain all key columns, we'll need to scan the
+ * the default partition, if any.
+ */
+ if (nvalues < partnatts)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ /*
+ * Look for the smallest bound that is = look-up value.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partition. */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ */
+
+ /*
+ * First find greatest bound that's smaller than the
+ * look-up value.
+ */
+ while (off >= 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+ /*
+ * We can treat off as the offset of the smallest bound to
+ * be included in the result, if we know it is the upper
+ * bound of the partition in which the look-up value could
+ * possibly exist. One case it couldn't is if the bound,
+ * or precisely the matched portion of its prefix, is not
+ * inclusive.
+ */
+ if (boundinfo->kind[off][nvalues] ==
+ PARTITION_RANGE_DATUM_MINVALUE)
+ off++;
+
+ minoff = off;
+
+ /*
+ * Now find smallest bound that's greater than the look-up
+ * value.
+ */
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * off + 1, then would be the offset of the greatest bound
+ * to be included in the result.
+ */
+ maxoff = off + 1;
+ }
+
+ /*
+ * Skip if minoff/maxoff are actually the upper bound of a
+ * un-assigned portion of values.
+ */
+ if (partindices[minoff] < 0 && minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+ else if (off >= 0) /* !is_equal */
+ {
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * only partition that may contain the look-up value.
+ */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ return result;
+ }
+ /*
+ * off < 0, meaning the look-up value is smaller that all bounds,
+ * so only the default partition, if any, qualifies.
+ */
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ /*
+ * Look for the smallest bound that is > or >= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the look-up value, so include
+ * all of them in the result.
+ */
+ minoff = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ *
+ * Based on whether the look-up values is inclusive or
+ * not, we must either include the indexes of all such
+ * bounds in the result (that is, set minoff to the index
+ * of smallest such bound) or find the smallest one that's
+ * greater than the look-up value and set minoff to that.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ minoff = inclusive ? off : off + 1;
+ }
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * smallest partition that may contain the look-up value.
+ */
+ else
+ minoff = off + 1;
+ }
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ /*
+ * Look for the greatest bound that is < or <= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the key, so we could only
+ * expect to find the look-up key in the default partition.
+ */
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ /*
+ * See the comment above.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ maxoff = inclusive ? off + 1: off;
+ }
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * greatest partition that may contain look-up value. If
+ * the look-up value had exactly matched the bound, but it
+ * isn't inclusive, no need add the adjacent partition.
+ */
+ else if (!is_equal || inclusive)
+ maxoff = off + 1;
+ else
+ maxoff = off;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition, as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (minoff > maxoff)
+ return result;
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c3efca3c45..450c64d6fc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2136,6 +2136,37 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5059,6 +5090,12 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 4157e7eb9a..c3f1789ce2 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2156,6 +2156,17 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2958,6 +2969,20 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c8d962670e..efd0a71a2c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1710,6 +1710,28 @@ _outFromExpr(StringInfo str, const FromExpr *node)
}
static void
+_outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPOP");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_INT_FIELD(opstrategy);
+ WRITE_NODE_FIELD(exprs);
+ WRITE_NODE_FIELD(cmpfns);
+ WRITE_BITMAPSET_FIELD(nullkeys);
+}
+
+static void
+_outPartitionPruneStepCombine(StringInfo str, const PartitionPruneStepCombine *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPCOMBINE");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ WRITE_NODE_FIELD(source_stepids);
+}
+
+static void
_outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
{
WRITE_NODE_TYPE("ONCONFLICTEXPR");
@@ -3958,6 +3980,12 @@ outNode(StringInfo str, const void *obj)
case T_OnConflictExpr:
_outOnConflictExpr(str, obj);
break;
+ case T_PartitionPruneStepOp:
+ _outPartitionPruneStepOp(str, obj);
+ break;
+ case T_PartitionPruneStepCombine:
+ _outPartitionPruneStepCombine(str, obj);
+ break;
case T_Path:
_outPath(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4518fa0cdb..25874074a0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1331,6 +1331,32 @@ _readOnConflictExpr(void)
READ_DONE();
}
+static PartitionPruneStepOp *
+_readPartitionPruneStepOp(void)
+{
+ READ_LOCALS(PartitionPruneStepOp);
+
+ READ_INT_FIELD(step.step_id);
+ READ_INT_FIELD(opstrategy);
+ READ_NODE_FIELD(exprs);
+ READ_NODE_FIELD(cmpfns);
+ READ_BITMAPSET_FIELD(nullkeys);
+
+ READ_DONE();
+}
+
+static PartitionPruneStepCombine *
+_readPartitionPruneStepCombine(void)
+{
+ READ_LOCALS(PartitionPruneStepCombine);
+
+ READ_INT_FIELD(step.step_id);
+ READ_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ READ_NODE_FIELD(source_stepids);
+
+ READ_DONE();
+}
+
/*
* Stuff from parsenodes.h.
*/
@@ -2596,6 +2622,10 @@ parseNodeString(void)
return_value = _readFromExpr();
else if (MATCH("ONCONFLICTEXPR", 14))
return_value = _readOnConflictExpr();
+ else if (MATCH("PARTITIONPRUNESTEPOP", 20))
+ return_value = _readPartitionPruneStepOp();
+ else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
+ return_value = _readPartitionPruneStepCombine();
else if (MATCH("RTE", 3))
return_value = _readRangeTblEntry();
else if (MATCH("RANGETBLFUNCTION", 16))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c4e4db15a6..fd89c7cfee 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -874,6 +875,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -881,6 +884,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1128,6 +1145,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..cc98dc5ea0
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1672 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Provides the functionality to match the provided set of clauses with
+ * the partition key to partition pruning "steps"
+ *
+ * If the "steps" contain enough information, partitions can be pruned right
+ * away in this module, which is called "static pruning", as all the needed
+ * information is statically available in the query being planned. Otherwise,
+ * they'd need to be delivered to the executor where the missing information
+ * can be filled and pruning tried one more time, which would be called
+ * "dynamic pruning".
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ bool isopne; /* is clause's original operator <> ? */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_isopne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_isopne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static PartitionPruneStep *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool isopne,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static PartitionPruneStep *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If static_pruning is true, include in the result only steps that contain at
+ * least one Const. If any of the clause in the input list is a
+ * pseudo-constant "false", *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (rel->has_default_part && rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the global array context->steps and
+ * each one gets an identifier which is unique across all recursive
+ * invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. The caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS];
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * indepdently, collect their step IDs to be stored in the combine
+ * step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps either means that arg_constfalse is true
+ * or the arg didn't contain a clause matching this
+ * partition key.
+ *
+ * In case of the latter, we cannot prune using such
+ * an arg. To indicate that to the pruning code, we
+ * must construct a dummy PartitionPruneStepCombine
+ * whose source_stepids is set to to an empty List.
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ /* Just ignore this argument. */
+ if (arg_constfalse)
+ continue;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_UNION);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ if (arg_stepids != NIL)
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ if (arg_stepids)
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_INTERSECT));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which if it's a Boolean clause
+ * clause, will be handled in match_clause_to_partition_key().
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ Assert(pc != NULL);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * generate_opsteps set to false means no OpExprs were directly present in
+ * the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ result =
+ lappend(result,
+ generate_pruning_step_op(context, 0, false, NIL, NIL,
+ nullkeys));
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ result =
+ lappend(result,
+ generate_pruning_step_op(context, 0, false,
+ NIL, NIL, NULL));
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an AND combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ if (step_ids != NIL)
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_INTERSECT));
+ }
+
+ return result;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Match a given clause with the specified partition key
+ *
+ * Return value:
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+ bool is_opne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_opne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_opne_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, BTORDER_PROC);
+ break;
+
+ case PARTITION_STRATEGY_HASH:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ exprtype, exprtype, HASHEXTENDED_PROC);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = palloc0(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ (*pc)->isopne = false;
+ if (is_opne_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ (*pc)->isopne = true;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ *rightop = NULL;
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have necessary equality
+ * clause, there should be an IS NULL clause, otherwise pruning is not
+ * possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't look.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of doing.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->isopne,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column (which may
+ * not be the last partition key column). Actually, the
+ * last element of eq_clauses must give us what we need.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * But there might be multiple clauses that we matched to
+ * that column; go to the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys, which
+ * get_steps_using_prefix takes care of doing.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys are NULL.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ false,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ if (opstep_ids != NIL)
+ return generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_INTERSECT);
+ return NULL;
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_isopne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStep *step;
+
+ step = generate_pruning_step_op(context,
+ step_opstrategy, step_isopne,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys);
+ return list_make1(step);
+ }
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_isopne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the still earlier columns.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_isopne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_isopne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ result =
+ lappend(result,
+ generate_pruning_step_op(context,
+ step_opstrategy, step_isopne,
+ step_exprs1, step_cmpfns1,
+ step_nullkeys));
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Following functions generate pruning steps of various types. Each step
+ * that's created is added to a global context->steps and receive a globally
+ * unique identifier that's sourced from context->next_step_id.
+ */
+
+static PartitionPruneStep *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool isopne,
+ List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+
+ /*
+ * For clauses that contain an <> operator, set opstrategy to
+ * InvalidStrategy to signal get_matching_list_bounds to do the
+ * right thing.
+ */
+ if (isopne)
+ {
+ Assert(opstrategy == BTEqualStrategyNumber);
+ opstep->opstrategy = InvalidStrategy;
+ }
+ else
+ opstep->opstrategy = opstrategy;
+ Assert(list_length(exprs) == list_length(cmpfns));
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (PartitionPruneStep *) opstep;
+}
+
+static PartitionPruneStep *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (PartitionPruneStep *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b46b33d4f7..32e973385d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1869,6 +1878,9 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->has_default_part =
+ OidIsValid(get_default_oid_from_partdesc(partdesc));
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..7f1428b8d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->has_default_part = false;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +569,8 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +738,10 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->has_default_part = false;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..8981901272 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -42,6 +42,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -74,4 +96,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fce48026b6..4df979e9eb 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -193,6 +193,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..c9d2187631 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,77 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the look-up key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_UNION,
+ COMBINE_INTERSECT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 83b03b41e4..9b9aabddef 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,8 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ bool has_default_part; /* does it have a default partition? */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index f56151fc1e..d799acb91f 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1926,11 +1926,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index a0edba291f..0be31cce7e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -1007,24 +1009,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1032,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1111,13 +1098,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(28 rows)
-- pruning should work fine, because values for a prefix of keys (a, b) are
-- available
@@ -1275,22 +1270,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning, with values provided for both keys
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1343,3 +1332,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d2b4561530..8377671cde 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -239,3 +239,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6d8a44cd9e..aa2ec281c4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1587,6 +1588,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1599,6 +1601,10 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
@@ -1752,6 +1758,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
--
2.11.0
v49-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v49-0004-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From 9fa876f0c7004bda95d7d925ac53293fd153c28f Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v49 4/4] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 99 ++++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 111 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 450c64d6fc..b0fa556f71 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2295,21 +2295,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5115,9 +5100,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 45ceba2830..28eecbbf08 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3214,9 +3204,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index efd0a71a2c..e6793b4716 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2271,7 +2271,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2296,6 +2295,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2345,6 +2345,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2570,16 +2571,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4127,9 +4118,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd89c7cfee..c36a254ed6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -884,6 +884,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1337,6 +1348,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1347,7 +1364,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1374,49 +1390,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1435,9 +1457,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 15c8d34c70..008492bad5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -616,7 +616,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -631,6 +630,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1191,12 +1191,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1268,10 +1268,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1503,6 +1505,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1603,6 +1614,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1610,7 +1636,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
0,
subpaths,
@@ -6145,65 +6171,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 5236ab378e..67e47887fc 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 7f1428b8d8..5ee67ba121 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -159,6 +159,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -574,6 +575,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -745,6 +747,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4df979e9eb..1ec8030d4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -265,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 9b9aabddef..afe1faf2ea 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -675,6 +679,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2130,27 +2135,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 07a3bc0627..c090396e13 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,9 +59,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index aa2ec281c4..adde8eaee9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1611,7 +1611,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
On 4 April 2018 at 17:42, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
I'm not sure about the following change in your patch:
- if (!result->scan_null) - result->scan_null = step_result->scan_null; - if (!result->scan_default) - result->scan_default = step_result->scan_default; + result->scan_null |= step_result->scan_null; + result->scan_default |= step_result->scan_default;Afaik, |= does bitwise OR, which even if it might give the result we want,
is not a logical operation. I had written the original code using the
following definition of logical OR.a OR b = if a then true else b
Ok, no problem. I only changed that to make it more compact.
For the record we do the same in plenty of over places over the code base:
E.g.
parse->hasSubLinks |= subquery->hasSubLinks;
/* If subquery had any RLS conditions, now main query does too */
parse->hasRowSecurity |= subquery->hasRowSecurity;
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 4 April 2018 at 19:04, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Attached v49.
Thank for including the changes. I'll look now.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 4 April 2018 at 19:04, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/04/04 14:42, Amit Langote wrote:
Attached v48.
I had forgotten to remove the static_pruning parameter I had added in the
v47, because it is no longer used. Static pruning now occurs even if a
step contains all Params, in which case each of
get_matching_hash/list/range_bounds() functions returns offsets of all
non-null datums, because the Params cannot be resolved to actual values
during static pruning.
Thanks for updating. I've made a pass over v49 and I didn't find very
much wrong with it.
The only real bug I found was a missing IsA(rinfo->clause, Const) in
the pseudoconstant check inside
generate_partition_pruning_steps_internal.
Most of the changes are comment fixes with a few stylistic changes
thrown which are pretty much all there just to try to shrink the code
a line or two or reduce indentation.
I feel pretty familiar with this code now and assuming the attached is
included I'm happy for someone else, hopefully, a committer to take a
look at it.
I'll leave the following notes:
1. Still not sure about RelOptInfo->has_default_part. This flag is
only looked at in generate_partition_pruning_steps. The RelOptInfo and
the boundinfo is available to look at, it's just that the
partition_bound_has_default macro is defined in partition.c rather
than partition.h.
2. Don't really like the new isopne variable name. It's not very
simple to decode, perhaps something like is_not_eq is better?
3. The part of the code I'm least familiar with is
get_steps_using_prefix_recurse(). I admit to not having had time to
fully understand that and consider ways to break it.
Marking as ready for committer.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v49_fixes_drowley.patchapplication/octet-stream; name=v49_fixes_drowley.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index e1ffc5271f..d6bce9f348 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1784,7 +1784,7 @@ perform_pruning_base_step(PartitionPruneContext *context,
/*
* Generate the partition look-up key that will be used by one of
- * the get_partitions_from_keys_* functions called below.
+ * the get_matching_*_bounds functions called below.
*/
for (keyno = 0; keyno < context->partnatts; keyno++)
{
@@ -1884,12 +1884,11 @@ perform_pruning_combine_step(PartitionPruneContext *context,
{
ListCell *lc1;
PruneStepResult *result = NULL;
+ bool firststep;
/*
- * In some cases, the planner generates a combine step that doesn't
- * contain any argument steps, to signal us to not prune any partitions.
- * So, return indexes of all datums in that case, including null and/or
- * default partition, if any.
+ * A combine step without any source steps is an indication to not perform
+ * any partition pruning, we just return all partitions.
*/
result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
if (list_length(cstep->source_stepids) == 0)
@@ -1901,86 +1900,77 @@ perform_pruning_combine_step(PartitionPruneContext *context,
result->scan_null = partition_bound_accepts_nulls(boundinfo);
return result;
}
- else
- {
- bool firststep;
- switch (cstep->combineOp)
- {
- case COMBINE_UNION:
- foreach(lc1, cstep->source_stepids)
- {
- int step_id = lfirst_int(lc1);
- PruneStepResult *step_result;
+ switch (cstep->combineOp)
+ {
+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
- /*
- * step_results[step_id] must contain a valid result,
- * which is confirmed by the fact that cstep's step_id is
- * greater than step_id and the fact that results of the
- * individual steps are evaluated in sequence of their
- * step_ids.
- */
- if (step_id >= cstep->step.step_id)
- elog(ERROR, "invalid pruning combine step argument");
- step_result = step_results[step_id];
- Assert(step_result != NULL);
+ /*
+ * step_results[step_id] must contain a valid result, which is
+ * confirmed by the fact that cstep's step_id is greater than
+ * step_id and the fact that results of the individual steps
+ * are evaluated in sequence of their step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
- /* Record any additional datum indexes from this step */
- result->bound_offsets =
- bms_add_members(result->bound_offsets,
+ /* Record any additional datum indexes from this step */
+ result->bound_offsets = bms_add_members(result->bound_offsets,
step_result->bound_offsets);
- /* Update whether to scan null and default partitions. */
- if (!result->scan_null)
- result->scan_null = step_result->scan_null;
- if (!result->scan_default)
- result->scan_default = step_result->scan_default;
- }
- break;
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
- case COMBINE_INTERSECT:
- firststep = true;
- foreach(lc1, cstep->source_stepids)
- {
- int step_id = lfirst_int(lc1);
- PruneStepResult *step_result;
+ case COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
- if (step_id >= cstep->step.step_id)
- elog(ERROR, "invalid pruning combine step argument");
- step_result = step_results[step_id];
- Assert(step_result != NULL);
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
- if (firststep)
- {
- /* Copy step's result the first time. */
- result->bound_offsets = step_result->bound_offsets;
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->bound_offsets = step_result->bound_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /* Record datum indexes common to both steps */
+ result->bound_offsets =
+ bms_int_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (result->scan_null)
result->scan_null = step_result->scan_null;
+ if (result->scan_default)
result->scan_default = step_result->scan_default;
- firststep = false;
- }
- else
- {
- /* Record datum indexes common to both steps */
- result->bound_offsets =
- bms_int_members(result->bound_offsets,
- step_result->bound_offsets);
-
- /*
- * Update whether to scan null and default partitions.
- */
- if (result->scan_null)
- result->scan_null = step_result->scan_null;
- if (result->scan_default)
- result->scan_default =
- step_result->scan_default;
- }
}
- break;
+ }
+ break;
- default:
- elog(ERROR, "invalid pruning combine op: %d",
- (int) cstep->combineOp);
- }
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
}
return result;
@@ -1988,7 +1978,8 @@ perform_pruning_combine_step(PartitionPruneContext *context,
/*
* partkey_datum_from_expr
- * Set *value to the constant value if 'expr' provides one
+ * Evaluate 'expr', set *value to the resulting Datum. Return true if
+ * evaluation was possible, otherwise false.
*/
static bool
partkey_datum_from_expr(PartitionPruneContext *context,
@@ -2009,26 +2000,25 @@ partkey_datum_from_expr(PartitionPruneContext *context,
/*
* get_matching_hash_bounds
- * Determine offset of the hash bound matching the specified value,
+ * Determine offset of the hash bound matching the specified values,
* considering that all the non-null values come from clauses containing
- * a compatible hash eqaulity operator and any keys that are null come
- * from an IS NULL clause
+ * a compatible hash equality operator and any keys that are null come
+ * from an IS NULL clause.
*
- * In most cases, the result would contain just one bound's offset, although
- * the set may be empty if the corresponding hash partition has not been
- * created. Also, if insufficient number of values were provided, all bounds
- * are returned.
+ * Generally this function will return a single matching bound offset,
+ * although if a partition has not been setup for a given modulus then we may
+ * return no matches. If the number of clauses found don't cover the entire
+ * partition key, then we'll need to return all offsets.
*
- * 'nvalues', if non-zero, denotes the number of values contained in 'values'
-
- * 'values' contains values to be used for pruning appearing in the array in
- * respective partition key position.
-
* 'opstrategy' if non-zero must be HTEqualStrategyNumber.
-
+ *
+ * 'values' contains Datums indexed by the partition key to use for pruning.
+ *
+ * 'nvalues', the number of Datums in the 'values' array.
+ *
* 'partsupfunc' contains partition hashing functions that can produce correct
- * hash for the type of the values contained in 'values'
-
+ * hash for the type of the values contained in 'values'.
+ *
* 'nullkeys' is the set of partition keys that are null.
*/
static PruneStepResult *
@@ -2087,15 +2077,16 @@ get_matching_hash_bounds(PartitionPruneContext *context,
* get_matching_list_bounds
* Determine the offsets of list bounds matching the specified value,
* according to the semantics of the given operator strategy
+ * 'opstrategy' if non-zero must be a btree strategy number.
+ *
+ * 'value' contains the value to use for pruning.
*
* 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
-
- * 'value' contains the value to use for pruning
-
- * 'opstrategy' if non-zero must be a btree strategy number
-
+ *
* 'partsupfunc' contains the list partitioning comparison function to be used
* to perform partition_list_bsearch
+ *
+ * 'nullkeys' is the set of partition keys that are null.
*/
static PruneStepResult *
get_matching_list_bounds(PartitionPruneContext *context,
@@ -2159,15 +2150,18 @@ get_matching_list_bounds(PartitionPruneContext *context,
/* Speical case handling of values coming from a <> operator clause. */
if (opstrategy == InvalidStrategy)
{
+ /*
+ * First match to all bounds. We'll remove any matching datums below.
+ */
result->bound_offsets = bms_add_range(NULL, 0,
- boundinfo->ndatums - 1);;
+ boundinfo->ndatums - 1);
off = partition_list_bsearch(partsupfunc, partcollation, boundinfo,
value, &is_equal);
if (off >= 0 && is_equal)
{
- /* All bounds except this one qualify. */
+ /* We have a match. Remove from the result. */
Assert(boundinfo->indexes[off] >= 0);
result->bound_offsets = bms_del_member(result->bound_offsets,
off);
@@ -2284,13 +2278,12 @@ get_matching_list_bounds(PartitionPruneContext *context,
* If default partition needs to be scanned for given values, set scan_default
* in result if present.
*
- * 'nvalues', if non-zero, should be <= context->partnatts - 1
-
- * 'values' contains values for partition keys (or a prefix) to be used for
- * pruning
-
- * 'opstrategy' if non-zero must be a btree strategy number
-
+ * 'opstrategy' if non-zero must be a btree strategy number.
+ *
+ * 'values' contains Datums indexed by the partition key to use for pruning.
+ *
+ * 'nvalues', number of Datums in 'values' array. Must be <= context->partnatts.
+ *
* 'partsupfunc' contains the range partitioning comparison functions to be
* used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
* using.
@@ -2361,9 +2354,7 @@ get_matching_range_bounds(PartitionPruneContext *context,
switch (opstrategy)
{
case BTEqualStrategyNumber:
- /*
- * Look for the smallest bound that is = look-up value.
- */
+ /* Look for the smallest bound that is = look-up value. */
off = partition_range_datum_bsearch(partsupfunc,
partcollation,
boundinfo,
@@ -2420,12 +2411,12 @@ get_matching_range_bounds(PartitionPruneContext *context,
boundinfo->kind[off],
values, nvalues));
/*
- * We can treat off as the offset of the smallest bound to
- * be included in the result, if we know it is the upper
- * bound of the partition in which the look-up value could
- * possibly exist. One case it couldn't is if the bound,
- * or precisely the matched portion of its prefix, is not
- * inclusive.
+ * We can treat 'off' as the offset of the smallest bound
+ * to be included in the result, if we know it is the
+ * upper bound of the partition in which the look-up value
+ * could possibly exist. One case it couldn't is if the
+ * bound, or precisely the matched portion of its prefix,
+ * is not inclusive.
*/
if (boundinfo->kind[off][nvalues] ==
PARTITION_RANGE_DATUM_MINVALUE)
@@ -2552,11 +2543,11 @@ get_matching_range_bounds(PartitionPruneContext *context,
* offset of one of them, find others by checking adjacent
* bounds.
*
- * Based on whether the look-up values is inclusive or
+ * Based on whether the look-up values are inclusive or
* not, we must either include the indexes of all such
* bounds in the result (that is, set minoff to the index
* of smallest such bound) or find the smallest one that's
- * greater than the look-up value and set minoff to that.
+ * greater than the look-up values and set minoff to that.
*/
while (off >= 1 && off < boundinfo->ndatums - 1)
{
@@ -2729,9 +2720,8 @@ get_matching_range_bounds(PartitionPruneContext *context,
}
Assert(minoff >= 0 && maxoff >= 0);
- if (minoff > maxoff)
- return result;
- result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ if (minoff <= maxoff)
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
return result;
}
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index cc98dc5ea0..75b7232f5d 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -1,15 +1,10 @@
/*-------------------------------------------------------------------------
*
* partprune.c
- * Provides the functionality to match the provided set of clauses with
- * the partition key to partition pruning "steps"
- *
- * If the "steps" contain enough information, partitions can be pruned right
- * away in this module, which is called "static pruning", as all the needed
- * information is statically available in the query being planned. Otherwise,
- * they'd need to be delivered to the executor where the missing information
- * can be filled and pruning tried one more time, which would be called
- * "dynamic pruning".
+ * Parses clauses attempting to match them up to partition keys of a
+ * given relation and generates a set of "pruning steps", which can be
+ * later "executed" either from the planner or the executor to determine
+ * the minimum set of partitions which match the given clauses.
*
* Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -186,9 +181,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* generate_partition_pruning_steps
* Processes 'clauses' and returns a list of "partition pruning steps"
*
- * If static_pruning is true, include in the result only steps that contain at
- * least one Const. If any of the clause in the input list is a
- * pseudo-constant "false", *constfalse is set to true upon return.
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
*/
List *
generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
@@ -253,9 +247,8 @@ generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
* arguments and generate PartitionPruneStepCombine step that will combine
* results of those steps.
*
- * All of the generated steps are added to the global array context->steps and
- * each one gets an identifier which is unique across all recursive
- * invocations.
+ * All of the generated steps are added to the context's steps List and each
+ * one gets an identifier which is unique across all recursive invocations.
*
* If when going through clauses, we find any that are marked as pseudoconstant
* and contains a constant false value, we stop generating any further steps
@@ -294,6 +287,7 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
clause = rinfo->clause;
if (rinfo->pseudoconstant &&
+ IsA(rinfo->clause, Const) &&
!DatumGetBool(((Const *) clause)->constvalue))
{
*constfalse = true;
@@ -309,8 +303,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
*
* While steps generated for the arguments themselves will be
* added to context->steps during recursion and will be evaluated
- * indepdently, collect their step IDs to be stored in the combine
- * step we'll be creating.
+ * independently, collect their step IDs to be stored in the
+ * combine step we'll be creating.
*/
if (or_clause((Node *) clause))
{
@@ -353,7 +347,7 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
* In case of the latter, we cannot prune using such
* an arg. To indicate that to the pruning code, we
* must construct a dummy PartitionPruneStepCombine
- * whose source_stepids is set to to an empty List.
+ * whose source_stepids is set to an empty List.
* However, if we can prove using constraint exclusion
* that the clause refutes the table's partition
* constraint (if it's sub-partitioned), we need not
@@ -440,7 +434,9 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
/*
* Fall-through for a NOT clause, which if it's a Boolean clause
- * clause, will be handled in match_clause_to_partition_key().
+ * clause, will be handled in match_clause_to_partition_key(). We
+ * currently don't perform any pruning for more complex NOT
+ * clauses.
*/
}
@@ -538,8 +534,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
}
/*
- * generate_opsteps set to false means no OpExprs were directly present in
- * the input list.
+ * If generate_opsteps is set to false it means no OpExprs were directly
+ * present in the input list.
*/
if (!generate_opsteps)
{
@@ -582,7 +578,7 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
/*
* Finally, results from all entries appearing in result should be
- * combined using an AND combine step, if there are more than 1.
+ * combined using an INTERSECT combine step, if there are more than 1.
*/
if (list_length(result) > 1)
{
@@ -616,7 +612,7 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
/*
* match_clause_to_partition_key
- * Match a given clause with the specified partition key
+ * Attempt to match the given 'clause' with the specified partition key.
*
* Return value:
*
@@ -669,13 +665,15 @@ match_clause_to_partition_key(RelOptInfo *rel,
*/
if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
{
- *pc = palloc0(sizeof(PartClauseInfo));
+ *pc = palloc(sizeof(PartClauseInfo));
(*pc)->keyno = partkeyidx;
/* Do pruning with the Boolean equality operator. */
(*pc)->opno = BooleanEqualOperator;
+ (*pc)->isopne = false;
(*pc)->expr = expr;
/* We know that expr is of Boolean type. */
(*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+ (*pc)->op_strategy = InvalidStrategy;
return PARTCLAUSE_MATCH_CLAUSE;
}
@@ -808,16 +806,23 @@ match_clause_to_partition_key(RelOptInfo *rel,
else
cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
- *pc = palloc0(sizeof(PartClauseInfo));
+ *pc = palloc(sizeof(PartClauseInfo));
(*pc)->keyno = partkeyidx;
/* For <> operator clauses, pass on the negator. */
(*pc)->isopne = false;
+ (*pc)->op_strategy = InvalidStrategy;
+
if (is_opne_listp)
{
Assert(OidIsValid(negator));
(*pc)->opno = negator;
(*pc)->isopne = true;
+ /*
+ * We already know the strategy in this case, so may as well set
+ * it rather than having to look it up later.
+ */
+ (*pc)->op_strategy = BTEqualStrategyNumber;
}
/* And if commuted before matching, pass on the commutator */
else if (OidIsValid(commutator))
@@ -1033,10 +1038,11 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
{
Expr *leftop;
+ *rightop = NULL;
+
if (!IsBooleanOpfamily(partopfamily))
return false;
- *rightop = NULL;
if (IsA(clause, BooleanTest))
{
BooleanTest *btest = (BooleanTest *) clause;
@@ -1124,9 +1130,9 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
break;
/*
- * For hash partitioning, if a column doesn't have necessary equality
- * clause, there should be an IS NULL clause, otherwise pruning is not
- * possible.
+ * For hash partitioning, if a column doesn't have the necessary
+ * equality clause, there should be an IS NULL clause, otherwise
+ * pruning is not possible.
*/
if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
clauselist == NIL && !bms_is_member(i, nullkeys))
@@ -1220,7 +1226,7 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
/*
* If we've decided that clauses for subsequent partition keys
- * wouldn't be useful for pruning, don't look.
+ * wouldn't be useful for pruning, don't search any further.
*/
if (!consider_next_key)
break;
@@ -1252,7 +1258,7 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
* contain multiple clauses for the same key, in which case,
* we must generate steps for various combinations of
* expressions of different keys, which get_steps_using_prefix
- * takes care of doing.
+ * takes care of for us.
*/
for (i = 0; i < BTMaxStrategyNumber; i++)
{
@@ -1357,15 +1363,16 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
ListCell *lc1;
/*
- * Locate the clause for the greatest column (which may
- * not be the last partition key column). Actually, the
- * last element of eq_clauses must give us what we need.
+ * Locate the clause for the greatest column. This may
+ * not belong to the last partition key, but it is the
+ * clause belonging to the last partition key we found a
+ * clause for above.
*/
pc = llast(eq_clauses);
/*
- * But there might be multiple clauses that we matched to
- * that column; go to the first such clause. While at it,
+ * There might be multiple clauses which matched to that
+ * partition key; find the first such clause. While at it,
* add all the clauses before that one to 'prefix'.
*/
last_keyno = pc->keyno;
@@ -1385,7 +1392,7 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
* contain multiple clauses for the same key, in which
* case, we must generate steps for various combinations
* of expressions of different keys, which
- * get_steps_using_prefix takes care of doing.
+ * get_steps_using_prefix will take care of for us.
*/
for_each_cell(lc1, lc)
{
@@ -1394,7 +1401,8 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
/*
* Note that we pass nullkeys for step_nullkeys,
* because we need to tell hash partition bound search
- * function which of the keys are NULL.
+ * function which of the keys we found IS NULL clauses
+ * for.
*/
Assert(pc->op_strategy == HTEqualStrategyNumber);
pc_steps =
@@ -1432,7 +1440,7 @@ generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
if (opstep_ids != NIL)
return generate_pruning_step_combine(context, opstep_ids,
- COMBINE_INTERSECT);
+ COMBINE_INTERSECT);
return NULL;
}
else if (opsteps != NIL)
@@ -1496,7 +1504,7 @@ get_steps_using_prefix(GeneratePruningStepsContext *context,
*
* 'start' is where we should start iterating for the current invocation.
* 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
- * we've generated so far from the clauses for the still earlier columns.
+ * we've generated so far from the clauses for the previous part keys.
*/
static List *
get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
@@ -1618,11 +1626,10 @@ get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
}
/*
- * Following functions generate pruning steps of various types. Each step
- * that's created is added to a global context->steps and receive a globally
- * unique identifier that's sourced from context->next_step_id.
+ * The following functions generate pruning steps of various types. Each step
+ * that's created is added to a context's 'steps' List and receives unique
+ * step identifier.
*/
-
static PartitionPruneStep *
generate_pruning_step_op(GeneratePruningStepsContext *context,
int opstrategy, bool isopne,
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index c9d2187631..965eb656a8 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1527,13 +1527,13 @@ typedef struct PartitionPruneStep
* This contains information extracted from up to partnatts OpExpr clauses,
* where partnatts is the number of partition key columns. 'opstrategy' is the
* strategy of the operator in the clause matched to the last partition key.
- * 'exprs' contains expressions which comprise the look-up key to be passed to
+ * 'exprs' contains expressions which comprise the lookup key to be passed to
* the partition bound search function. 'cmpfns' contains the OIDs of
* comparison function used to compare aforementioned expressions with
* partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
* items up to partnatts items.
*
- * Once we find the offset of a partition bound using the look-up key, we
+ * Once we find the offset of a partition bound using the lookup key, we
* determine which partitions to include in the result based on the value of
* 'opstrategy'. For example, if it were equality, we'd return just the
* partition that would contain that key or a set of partitions if the key
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 0be31cce7e..2d77b3edd4 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1079,8 +1079,7 @@ explain (costs off) select * from boolpart where a is not unknown;
--
-- pruning for partitioned table appearing inside a sub-query
--
--- pruning won't work for mc3p, because the leading key (a) is compared to a
--- Param, which turns off the static pruning
+-- pruning won't work for mc3p, because some keys are Params
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
QUERY PLAN
-----------------------------------------------------------------------
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 8377671cde..ad5177715c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -159,9 +159,7 @@ explain (costs off) select * from boolpart where a is not unknown;
--
-- pruning for partitioned table appearing inside a sub-query
--
-
--- pruning won't work for mc3p, because the leading key (a) is compared to a
--- Param, which turns off the static pruning
+-- pruning won't work for mc3p, because some keys are Params
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
-- pruning should work fine, because values for a prefix of keys (a, b) are
Hi,
On 04/04/2018 09:29 AM, David Rowley wrote:
Thanks for updating. I've made a pass over v49 and I didn't find very
much wrong with it.The only real bug I found was a missing IsA(rinfo->clause, Const) in
the pseudoconstant check inside
generate_partition_pruning_steps_internal.Most of the changes are comment fixes with a few stylistic changes
thrown which are pretty much all there just to try to shrink the code
a line or two or reduce indentation.I feel pretty familiar with this code now and assuming the attached is
included I'm happy for someone else, hopefully, a committer to take a
look at it.I'll leave the following notes:
1. Still not sure about RelOptInfo->has_default_part. This flag is
only looked at in generate_partition_pruning_steps. The RelOptInfo and
the boundinfo is available to look at, it's just that the
partition_bound_has_default macro is defined in partition.c rather
than partition.h.2. Don't really like the new isopne variable name. It's not very
simple to decode, perhaps something like is_not_eq is better?3. The part of the code I'm least familiar with is
get_steps_using_prefix_recurse(). I admit to not having had time to
fully understand that and consider ways to break it.Marking as ready for committer.
Passes check-world, and CommitFest app has been updated to reflect the
current patch set. Trivial changes attached.
Best regards,
Jesper
Attachments:
v49_fixes_jpedersen.patchtext/x-patch; name=v49_fixes_jpedersen.patchDownload
From 82f718579dc8e06ab77d76df4ed72f0f03ed4a4e Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Wed, 4 Apr 2018 11:27:59 -0400
Subject: [PATCH] Trivial changes
---
src/backend/catalog/partition.c | 10 +++++-----
src/backend/optimizer/util/partprune.c | 8 ++++----
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d6bce9f348..7a268e05dc 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -146,7 +146,7 @@ typedef struct PruneStepResult
{
/*
* This contains the offsets of the bounds in a table's boundinfo, each of
- * which is a bound whose corresponding partition is selected by a a given
+ * which is a bound whose corresponding partition is selected by a given
* pruning step.
*/
Bitmapset *bound_offsets;
@@ -2026,7 +2026,7 @@ get_matching_hash_bounds(PartitionPruneContext *context,
int opstrategy, Datum *values, int nvalues,
FmgrInfo *partsupfunc, Bitmapset *nullkeys)
{
- PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
PartitionBoundInfo boundinfo = context->boundinfo;
int *partindices = boundinfo->indexes;
int partnatts = context->partnatts;
@@ -2093,7 +2093,7 @@ get_matching_list_bounds(PartitionPruneContext *context,
int opstrategy, Datum value, int nvalues,
FmgrInfo *partsupfunc, Bitmapset *nullkeys)
{
- PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
PartitionBoundInfo boundinfo = context->boundinfo;
int off,
minoff,
@@ -2147,7 +2147,7 @@ get_matching_list_bounds(PartitionPruneContext *context,
return result;
}
- /* Speical case handling of values coming from a <> operator clause. */
+ /* Special case handling of values coming from a <> operator clause. */
if (opstrategy == InvalidStrategy)
{
/*
@@ -2295,7 +2295,7 @@ get_matching_range_bounds(PartitionPruneContext *context,
int opstrategy, Datum *values, int nvalues,
FmgrInfo *partsupfunc, Bitmapset *nullkeys)
{
- PruneStepResult *result = palloc0(sizeof(PruneStepResult));
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
PartitionBoundInfo boundinfo = context->boundinfo;
Oid *partcollation = context->partcollation;
int partnatts = context->partnatts;
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index 75b7232f5d..2d06c1a519 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -433,8 +433,8 @@ generate_partition_pruning_steps_internal(RelOptInfo *rel,
}
/*
- * Fall-through for a NOT clause, which if it's a Boolean clause
- * clause, will be handled in match_clause_to_partition_key(). We
+ * Fall-through for a NOT clause, which if it's a Boolean clause,
+ * will be handled in match_clause_to_partition_key(). We
* currently don't perform any pruning for more complex NOT
* clauses.
*/
@@ -665,7 +665,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
*/
if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
{
- *pc = palloc(sizeof(PartClauseInfo));
+ *pc = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
(*pc)->keyno = partkeyidx;
/* Do pruning with the Boolean equality operator. */
(*pc)->opno = BooleanEqualOperator;
@@ -806,7 +806,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
else
cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
- *pc = palloc(sizeof(PartClauseInfo));
+ *pc = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
(*pc)->keyno = partkeyidx;
/* For <> operator clauses, pass on the negator. */
--
2.13.6
Hi.
On 2018/04/05 0:45, Jesper Pedersen wrote:
Hi,
On 04/04/2018 09:29 AM, David Rowley wrote:
Thanks for updating. I've made a pass over v49 and I didn't find very
much wrong with it.The only real bug I found was a missing IsA(rinfo->clause, Const) in
the pseudoconstant check inside
generate_partition_pruning_steps_internal.
Fixed.
Most of the changes are comment fixes with a few stylistic changes
thrown which are pretty much all there just to try to shrink the code
a line or two or reduce indentation.I feel pretty familiar with this code now and assuming the attached is
included I'm happy for someone else, hopefully, a committer to take a
look at it.
Thank you, your changes look good to me.
I'll leave the following notes:
1. Still not sure about RelOptInfo->has_default_part. This flag is
only looked at in generate_partition_pruning_steps. The RelOptInfo and
the boundinfo is available to look at, it's just that the
partition_bound_has_default macro is defined in partition.c rather
than partition.h.
Hmm, it might not be such a bad idea to bring out the
PartitionBoundInfoData into partition.h. If we do that, we won't need the
has_default_part that the patch adds to RelOptInfo.
In the Attached v50 set, 0002 does that.
2. Don't really like the new isopne variable name. It's not very
simple to decode, perhaps something like is_not_eq is better?
isopne does sound a bit unintelligible. I propose op_is_ne so that it
sounds consistent with the preceding member of the struct that's called
opno. I want to keep "ne" and not start calling it not_eq, as a few other
places use the string "ne" to refer to a similar thing, like:
/* inequality */
Datum
range_ne(PG_FUNCTION_ARGS)
Datum
timestamptz_ne_date(PG_FUNCTION_ARGS)
Since the field is local to partprune.c, I guess that it's fine as the
comment where it's defined tells what it is.
3. The part of the code I'm least familiar with is
get_steps_using_prefix_recurse(). I admit to not having had time to
fully understand that and consider ways to break it.
The purpose of that code is to generate *all* needed steps to be combined
using COMBINE_INTERSECT such that the pruning will occur using the most
restrictive set of clauses in cases where the same key is referenced in
multiple restriction clauses containing non-equality operators. So, for a
range partitioned table on (a, b):
For a query like
explain select * from foo a <= 1 and a <= 3 and b < 5 and b <= 10
Pruning steps generated to be combined with an enclosing INTERSECT step
will be as follows:
<= (1, 10)
< (1, 5)
<= (3, 10)
< (3, 5)
Marking as ready for committer.
Thank you!
Passes check-world, and CommitFest app has been updated to reflect the
current patch set. Trivial changes attached.
Merged these changes. Thanks again Jesper.
Attached v50.
Thanks,
Amit
On 2018/04/05 17:28, Amit Langote wrote:
Attached v50.
Really attached this time.
Thanks,
Amit
Attachments:
v50-0003-Add-more-tests-for-partition-pruning.patchtext/plain; charset=UTF-8; name=v50-0003-Add-more-tests-for-partition-pruning.patchDownload
From 4ecc594076ff37d275d0aca485d959597ff83f4c Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Mon, 12 Mar 2018 21:13:38 +0900
Subject: [PATCH v50 3/5] Add more tests for partition pruning
---
src/test/regress/expected/partition_prune.out | 258 +++++++++++++++++++++++++-
src/test/regress/sql/partition_prune.sql | 88 ++++++++-
2 files changed, 344 insertions(+), 2 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..a0edba291f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1086,4 +1086,260 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(20 rows)
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(7 rows)
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+ -> Seq Scan on coll_pruning_multi3
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(7 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..d2b4561530 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,4 +152,90 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+
+-- pruning won't work for mc3p, because the leading key (a) is compared to a
+-- Param, which turns off the static pruning
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
2.11.0
v50-0004-Faster-partition-pruning.patchtext/plain; charset=UTF-8; name=v50-0004-Faster-partition-pruning.patchDownload
From e89b08f3aae8d3745ef3795220ca00bbd6c385f7 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 22 Aug 2017 13:48:13 +0900
Subject: [PATCH v50 4/5] Faster partition pruning
This adds a new module partprune.c in the optimizer, which is meant
as a replacement for using constraint exclusion to prune individual
partitions. The new module performs partition pruning using the
information contained in parent/partitioned table's boundinfo, after
extracting clauses that involve partition keys.
With the new module's functionality in place, set_append_rel_size()
calls prune_append_rel_partitions() to get a Bitmapset of partitions
that need to be scanned and processes only the partitions contained
in the set.
Authors: Amit Langote,
David Rowley (david.rowley@2ndquadrant.com),
Dilip Kumar (dilipbalaut@gmail.com)
---
src/backend/catalog/partition.c | 1104 ++++++++++++++++
src/backend/nodes/copyfuncs.c | 37 +
src/backend/nodes/nodeFuncs.c | 25 +
src/backend/nodes/outfuncs.c | 28 +
src/backend/nodes/readfuncs.c | 30 +
src/backend/optimizer/path/allpaths.c | 28 +
src/backend/optimizer/util/Makefile | 2 +-
src/backend/optimizer/util/partprune.c | 1680 +++++++++++++++++++++++++
src/backend/optimizer/util/plancat.c | 42 +-
src/backend/optimizer/util/relnode.c | 5 +
src/include/catalog/partition.h | 25 +
src/include/catalog/pg_opfamily.h | 3 +
src/include/nodes/nodes.h | 3 +
src/include/nodes/primnodes.h | 73 ++
src/include/nodes/relation.h | 3 +
src/include/optimizer/partprune.h | 23 +
src/test/regress/expected/inherit.out | 4 +-
src/test/regress/expected/partition_prune.out | 285 ++++-
src/test/regress/sql/partition_prune.sql | 43 +-
src/tools/pgindent/typedefs.list | 7 +
20 files changed, 3372 insertions(+), 78 deletions(-)
create mode 100644 src/backend/optimizer/util/partprune.c
create mode 100644 src/include/optimizer/partprune.h
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 17b2716c66..73631ca0e7 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -84,6 +84,23 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * The following struct describes the result of performing one
+ * PartitionPruneStep.
+ */
+typedef struct PruneStepResult
+{
+ /*
+ * This contains the offsets of the bounds in a table's boundinfo, each of
+ * which is a bound whose corresponding partition is selected by a given
+ * pruning step.
+ */
+ Bitmapset *bound_offsets;
+
+ /* Set if we need to scan the default and/or the null partition, resp. */
+ bool scan_default;
+ bool scan_null;
+} PruneStepResult;
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -143,6 +160,23 @@ static int get_greatest_modulus(PartitionBoundInfo b);
static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+static PruneStepResult *get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+
/*
* RelationBuildPartitionDesc
* Form rel's partition descriptor
@@ -1566,9 +1600,1079 @@ get_partition_qual_relid(Oid relid)
return result;
}
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning steps
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **results,
+ *final_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. The
+ * result of applying all pruning steps is the value contained in the slot
+ * of the last pruning step.
+ */
+ results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know the offsets of all the datums whose corresponding
+ * partitions need to be in the result, including special null-accepting
+ * and default partitions. Collect the actual partition indexes now.
+ */
+ final_result = results[num_steps - 1];
+ Assert(final_result != NULL);
+ i = -1;
+ result = NULL;
+ while ((i = bms_next_member(final_result->bound_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed and if present. */
+ if (final_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partition_bound_accepts_nulls(context->boundinfo));
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (final_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(partition_bound_has_default(context->boundinfo));
+ result = bms_add_member(result, context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
/* Module-local functions */
/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /*
+ * There better be the same number of expressions and compare functions.
+ */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * the get_matching_*_bounds functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ fmgr_info_copy(&partsupfunc[keyno],
+ &context->partsupfunc[keyno],
+ CurrentMemoryContext);
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_matching_hash_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ break;
+ }
+
+ return NULL;
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids using the specified
+ * combination method
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+ bool firststep;
+
+ /*
+ * A combine step without any source steps is an indication to not perform
+ * any partition pruning, we just return all partitions.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (list_length(cstep->source_stepids) == 0)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->bound_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+
+ switch (cstep->combineOp)
+ {
+ case COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result, which is
+ * confirmed by the fact that cstep's step_id is greater than
+ * step_id and the fact that results of the individual steps
+ * are evaluated in sequence of their step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ /* Record any additional datum indexes from this step */
+ result->bound_offsets = bms_add_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
+
+ case COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->bound_offsets = step_result->bound_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /* Record datum indexes common to both steps */
+ result->bound_offsets =
+ bms_int_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+
+ return result;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Evaluate 'expr', set *value to the resulting Datum. Return true if
+ * evaluation was possible, otherwise false.
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * get_matching_hash_bounds
+ * Determine offset of the hash bound matching the specified values,
+ * considering that all the non-null values come from clauses containing
+ * a compatible hash equality operator and any keys that are null come
+ * from an IS NULL clause.
+ *
+ * Generally this function will return a single matching bound offset,
+ * although if a partition has not been setup for a given modulus then we may
+ * return no matches. If the number of clauses found don't cover the entire
+ * partition key, then we'll need to return all offsets.
+ *
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+ *
+ * 'values' contains Datums indexed by the partition key to use for pruning.
+ *
+ * 'nvalues', the number of Datums in the 'values' array.
+ *
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we got values for all keys.
+ */
+ if (nvalues + bms_num_members(nullkeys) == partnatts)
+ {
+ /*
+ * If there are any values, they must have come from clauses
+ * containing an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ result->bound_offsets =
+ bms_make_singleton(rowHash % greatest_modulus);
+ }
+ else
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+
+ /*
+ * There is neither a special hash null partition or the default hash
+ * partition.
+ */
+ result->scan_null = result->scan_default = false;
+
+ return result;
+}
+
+/*
+ * get_matching_list_bounds
+ * Determine the offsets of list bounds matching the specified value,
+ * according to the semantics of the given operator strategy
+ * 'opstrategy' if non-zero must be a btree strategy number.
+ *
+ * 'value' contains the value to use for pruning.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
+ *
+ * 'partsupfunc' contains the list partitioning comparison function to be used
+ * to perform partition_list_bsearch
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ result->scan_null = result->scan_default = false;
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ result->scan_null = true;
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default if any.
+ */
+ if (nvalues == 0)
+ {
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /* Special case handling of values coming from a <> operator clause. */
+ if (opstrategy == InvalidStrategy)
+ {
+ /*
+ * First match to all bounds. We'll remove any matching datums below.
+ */
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+
+ off = partition_list_bsearch(partsupfunc, partcollation, boundinfo,
+ value, &is_equal);
+ if (off >= 0 && is_equal)
+ {
+
+ /* We have a match. Remove from the result. */
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_del_member(result->bound_offsets,
+ off);
+ }
+
+ /* Always include the default partition if any. */
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ return result;
+ }
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_make_singleton(off);
+ }
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partitions satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
+ * get_matching_range_datums
+ * Determine the offsets of range bounds matching the specified values,
+ * according to the semantics of the given operator strategy
+ *
+ * Each datum whose offset is in result is to be treated as the upper bound of
+ * the partition that will contain the desired values.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'opstrategy' if non-zero must be a btree strategy number.
+ *
+ * 'values' contains Datums indexed by the partition key to use for pruning.
+ *
+ * 'nvalues', number of Datums in 'values' array. Must be <= context->partnatts.
+ *
+ * 'partsupfunc' contains the range partitioning comparison functions to be
+ * used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * using.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(nvalues <= partnatts);
+
+ result->scan_null = result->scan_default = false;
+
+ /*
+ * If there are no datums to compare keys with, or if we got an IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+ }
+
+ /*
+ * If the query does not constrain all key columns, we'll need to scan the
+ * the default partition, if any.
+ */
+ if (nvalues < partnatts)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ /* Look for the smallest bound that is = look-up value. */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partition. */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ */
+
+ /*
+ * First find greatest bound that's smaller than the
+ * look-up value.
+ */
+ while (off >= 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+ /*
+ * We can treat 'off' as the offset of the smallest bound
+ * to be included in the result, if we know it is the
+ * upper bound of the partition in which the look-up value
+ * could possibly exist. One case it couldn't is if the
+ * bound, or precisely the matched portion of its prefix,
+ * is not inclusive.
+ */
+ if (boundinfo->kind[off][nvalues] ==
+ PARTITION_RANGE_DATUM_MINVALUE)
+ off++;
+
+ minoff = off;
+
+ /*
+ * Now find smallest bound that's greater than the look-up
+ * value.
+ */
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * off + 1, then would be the offset of the greatest bound
+ * to be included in the result.
+ */
+ maxoff = off + 1;
+ }
+
+ /*
+ * Skip if minoff/maxoff are actually the upper bound of a
+ * un-assigned portion of values.
+ */
+ if (partindices[minoff] < 0 && minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+ else if (off >= 0) /* !is_equal */
+ {
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * only partition that may contain the look-up value.
+ */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ return result;
+ }
+ /*
+ * off < 0, meaning the look-up value is smaller that all bounds,
+ * so only the default partition, if any, qualifies.
+ */
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ /*
+ * Look for the smallest bound that is > or >= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the look-up value, so include
+ * all of them in the result.
+ */
+ minoff = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ *
+ * Based on whether the look-up values are inclusive or
+ * not, we must either include the indexes of all such
+ * bounds in the result (that is, set minoff to the index
+ * of smallest such bound) or find the smallest one that's
+ * greater than the look-up values and set minoff to that.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ minoff = inclusive ? off : off + 1;
+ }
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * smallest partition that may contain the look-up value.
+ */
+ else
+ minoff = off + 1;
+ }
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ /*
+ * Look for the greatest bound that is < or <= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the key, so we could only
+ * expect to find the look-up key in the default partition.
+ */
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ /*
+ * See the comment above.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ maxoff = inclusive ? off + 1: off;
+ }
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * greatest partition that may contain look-up value. If
+ * the look-up value had exactly matched the bound, but it
+ * isn't inclusive, no need add the adjacent partition.
+ */
+ else if (!is_equal || inclusive)
+ maxoff = off + 1;
+ else
+ maxoff = off;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition, as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (minoff <= maxoff)
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+}
+
+/*
* get_partition_operator
*
* Return oid of the operator of given strategy for a given partition key
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c3efca3c45..450c64d6fc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2136,6 +2136,37 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -5059,6 +5090,12 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 4157e7eb9a..c3f1789ce2 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2156,6 +2156,17 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2958,6 +2969,20 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c8d962670e..efd0a71a2c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1710,6 +1710,28 @@ _outFromExpr(StringInfo str, const FromExpr *node)
}
static void
+_outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPOP");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_INT_FIELD(opstrategy);
+ WRITE_NODE_FIELD(exprs);
+ WRITE_NODE_FIELD(cmpfns);
+ WRITE_BITMAPSET_FIELD(nullkeys);
+}
+
+static void
+_outPartitionPruneStepCombine(StringInfo str, const PartitionPruneStepCombine *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPCOMBINE");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ WRITE_NODE_FIELD(source_stepids);
+}
+
+static void
_outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
{
WRITE_NODE_TYPE("ONCONFLICTEXPR");
@@ -3958,6 +3980,12 @@ outNode(StringInfo str, const void *obj)
case T_OnConflictExpr:
_outOnConflictExpr(str, obj);
break;
+ case T_PartitionPruneStepOp:
+ _outPartitionPruneStepOp(str, obj);
+ break;
+ case T_PartitionPruneStepCombine:
+ _outPartitionPruneStepCombine(str, obj);
+ break;
case T_Path:
_outPath(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4518fa0cdb..25874074a0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1331,6 +1331,32 @@ _readOnConflictExpr(void)
READ_DONE();
}
+static PartitionPruneStepOp *
+_readPartitionPruneStepOp(void)
+{
+ READ_LOCALS(PartitionPruneStepOp);
+
+ READ_INT_FIELD(step.step_id);
+ READ_INT_FIELD(opstrategy);
+ READ_NODE_FIELD(exprs);
+ READ_NODE_FIELD(cmpfns);
+ READ_BITMAPSET_FIELD(nullkeys);
+
+ READ_DONE();
+}
+
+static PartitionPruneStepCombine *
+_readPartitionPruneStepCombine(void)
+{
+ READ_LOCALS(PartitionPruneStepCombine);
+
+ READ_INT_FIELD(step.step_id);
+ READ_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ READ_NODE_FIELD(source_stepids);
+
+ READ_DONE();
+}
+
/*
* Stuff from parsenodes.h.
*/
@@ -2596,6 +2622,10 @@ parseNodeString(void)
return_value = _readFromExpr();
else if (MATCH("ONCONFLICTEXPR", 14))
return_value = _readOnConflictExpr();
+ else if (MATCH("PARTITIONPRUNESTEPOP", 20))
+ return_value = _readPartitionPruneStepOp();
+ else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
+ return_value = _readPartitionPruneStepCombine();
else if (MATCH("RTE", 3))
return_value = _readRangeTblEntry();
else if (MATCH("RANGETBLFUNCTION", 16))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c4e4db15a6..fd89c7cfee 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -874,6 +875,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -881,6 +884,20 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1128,6 +1145,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..5b306193e1
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,1680 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Parses clauses attempting to match them up to partition keys of a
+ * given relation and generates a set of "pruning steps", which can be
+ * later "executed" either from the planner or the executor to determine
+ * the minimum set of partitions which match the given clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ bool op_is_ne; /* is clause's original operator <> ? */
+ Expr *expr; /* The expr the partition key is being
+ * compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+
+ /* cached info. */
+ int op_strategy;
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * generate_partition_pruning_steps() initializes an instance of this struct,
+ * which is used throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+static List *generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static PartitionPruneStep *generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool op_is_ne,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static PartitionPruneStep *generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result = NULL;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = generate_partition_pruning_steps(rel, clauses,
+ &constfalse);
+
+ if (!constfalse)
+ {
+ /* Actual pruning happens here. */
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ /* Initiate partition pruning using clauses. */
+ memset(&context, 0, sizeof(context));
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+ }
+
+ return result;
+}
+
+/*
+ * generate_partition_pruning_steps
+ * Processes 'clauses' and returns a list of "partition pruning steps"
+ *
+ * If any of the clause in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (partition_bound_has_default(rel->boundinfo) &&
+ rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ (void) generate_partition_pruning_steps_internal(rel, &context, clauses,
+ constfalse);
+
+ return context.steps;
+}
+
+/* Module-local functions */
+
+/*
+ * generate_partition_pruning_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each of its
+ * arguments and generate PartitionPruneStepCombine step that will combine
+ * results of those steps.
+ *
+ * All of the generated steps are added to the context's steps List and each
+ * one gets an identifier which is unique across all recursive invocations.
+ *
+ * If when going through clauses, we find any that are marked as pseudoconstant
+ * and contains a constant false value, we stop generating any further steps
+ * and simply return NIL (that is, no pruning steps) after setting *constfalse
+ * to true. The caller should consider all partitions as pruned in that case.
+ * We may do the same if we find that mutually contradictory clauses are
+ * present, but were not turned into a pseudoconstant at higher levels.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+generate_partition_pruning_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS];
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ *constfalse = false;
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ IsA(rinfo->clause, Const) &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * independently, collect their step IDs to be stored in the
+ * combine step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ argsteps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps either means that arg_constfalse is true
+ * or the arg didn't contain a clause matching this
+ * partition key.
+ *
+ * In case of the latter, we cannot prune using such
+ * an arg. To indicate that to the pruning code, we
+ * must construct a dummy PartitionPruneStepCombine
+ * whose source_stepids is set to an empty List.
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ /* Just ignore this argument. */
+ if (arg_constfalse)
+ continue;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = generate_pruning_step_combine(context,
+ NIL,
+ COMBINE_UNION);
+ arg_stepids = lappend_int(arg_stepids,
+ orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ if (arg_stepids != NIL)
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_UNION));
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ argsteps =
+ generate_partition_pruning_steps_internal(rel,
+ context,
+ args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ if (arg_stepids)
+ result =
+ lappend(result,
+ generate_pruning_step_combine(context,
+ arg_stepids,
+ COMBINE_INTERSECT));
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which if it's a Boolean clause,
+ * will be handled in match_clause_to_partition_key(). We
+ * currently don't perform any pruning for more complex NOT
+ * clauses.
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ Assert(pc != NULL);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * If generate_opsteps is set to false it means no OpExprs were directly
+ * present in the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ result =
+ lappend(result,
+ generate_pruning_step_op(context, 0, false, NIL, NIL,
+ nullkeys));
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ result =
+ lappend(result,
+ generate_pruning_step_op(context, 0, false,
+ NIL, NIL, NULL));
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an INTERSECT combine step, if there are more than 1.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ if (step_ids != NIL)
+ result = lappend(result,
+ generate_pruning_step_combine(context, step_ids,
+ COMBINE_INTERSECT));
+ }
+
+ return result;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Attempt to match the given 'clause' with the specified partition key.
+ *
+ * Return value:
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments may be self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ *pc = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ (*pc)->opno = BooleanEqualOperator;
+ (*pc)->op_is_ne = false;
+ (*pc)->expr = expr;
+ /* We know that expr is of Boolean type. */
+ (*pc)->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+ (*pc)->op_strategy = InvalidStrategy;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+ bool is_opne_listp = false;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_opne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_opne_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, BTORDER_PROC);
+ break;
+
+ case PARTITION_STRATEGY_HASH:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ exprtype, exprtype, HASHEXTENDED_PROC);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ *pc = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
+ (*pc)->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ (*pc)->op_is_ne = false;
+ (*pc)->op_strategy = InvalidStrategy;
+
+ if (is_opne_listp)
+ {
+ Assert(OidIsValid(negator));
+ (*pc)->opno = negator;
+ (*pc)->op_is_ne = true;
+ /*
+ * We already know the strategy in this case, so may as well set
+ * it rather than having to look it up later.
+ */
+ (*pc)->op_strategy = BTEqualStrategyNumber;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ (*pc)->opno = commutator;
+ else
+ (*pc)->opno = opclause->opno;
+
+ (*pc)->expr = expr;
+ (*pc)->cmpfn = cmpfn;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse;
+
+ *clause_steps =
+ generate_partition_pruning_steps_internal(rel, context,
+ elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have the necessary
+ * equality clause, there should be an IS NULL clause, otherwise
+ * pruning is not possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't search any further.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of for us.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->op_is_ne,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column. This may
+ * not belong to the last partition key, but it is the
+ * clause belonging to the last partition key we found a
+ * clause for above.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * There might be multiple clauses which matched to that
+ * partition key; find the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys, which
+ * get_steps_using_prefix will take care of for us.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys we found IS NULL clauses
+ * for.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ false,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ if (opstep_ids != NIL)
+ return generate_pruning_step_combine(context, opstep_ids,
+ COMBINE_INTERSECT);
+ return NULL;
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStep *step;
+
+ step = generate_pruning_step_op(context,
+ step_opstrategy, step_op_is_ne,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys);
+ return list_make1(step);
+ }
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_op_is_ne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the previous part keys.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_op_is_ne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ result =
+ lappend(result,
+ generate_pruning_step_op(context,
+ step_opstrategy, step_op_is_ne,
+ step_exprs1, step_cmpfns1,
+ step_nullkeys));
+ }
+ }
+
+ return result;
+}
+
+/*
+ * The following functions generate pruning steps of various types. Each step
+ * that's created is added to a context's 'steps' List and receives unique
+ * step identifier.
+ */
+static PartitionPruneStep *
+generate_pruning_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool op_is_ne,
+ List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+
+ /*
+ * For clauses that contain an <> operator, set opstrategy to
+ * InvalidStrategy to signal get_matching_list_bounds to do the
+ * right thing.
+ */
+ if (op_is_ne)
+ {
+ Assert(opstrategy == BTEqualStrategyNumber);
+ opstep->opstrategy = InvalidStrategy;
+ }
+ else
+ opstep->opstrategy = opstrategy;
+ Assert(list_length(exprs) == list_length(cmpfns));
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (PartitionPruneStep *) opstep;
+}
+
+static PartitionPruneStep *
+generate_pruning_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (PartitionPruneStep *) cstep;
+}
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b46b33d4f7..52e4cca49a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1869,6 +1878,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..a068a0090e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,6 +154,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
@@ -567,6 +568,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
@@ -734,6 +736,9 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 62beee68b6..b4b4844f20 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -93,6 +93,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -125,4 +147,7 @@ extern List *get_proposed_default_constraint(List *new_part_constaints);
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
+/* For partition-pruning */
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
#endif /* PARTITION_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fce48026b6..4df979e9eb 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -193,6 +193,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..965eb656a8 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,77 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+/*
+ * Node types to represent a partition pruning step
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the lookup key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the lookup key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ COMBINE_UNION,
+ COMBINE_INTERSECT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 83b03b41e4..d4bcacbd4f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -535,6 +535,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
*
@@ -667,6 +669,7 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..1f2fe297a3
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern List *generate_partition_pruning_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 5e57b9a465..b2b912ed5c 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1951,11 +1951,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index a0edba291f..2d77b3edd4 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -1007,24 +1009,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1032,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1092,8 +1079,7 @@ explain (costs off) select * from boolpart where a is not unknown;
--
-- pruning for partitioned table appearing inside a sub-query
--
--- pruning won't work for mc3p, because the leading key (a) is compared to a
--- Param, which turns off the static pruning
+-- pruning won't work for mc3p, because some keys are Params
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
QUERY PLAN
-----------------------------------------------------------------------
@@ -1111,13 +1097,21 @@ explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1 t2_1
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p5 t2_2
+ -> Seq Scan on mc3p2 t2_2
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p7 t2_3
+ -> Seq Scan on mc3p3 t2_3
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default t2_4
+ -> Seq Scan on mc3p4 t2_4
Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
-(20 rows)
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(28 rows)
-- pruning should work fine, because values for a prefix of keys (a, b) are
-- available
@@ -1275,22 +1269,16 @@ explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' co
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-> Seq Scan on coll_pruning_multi2
Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
- -> Seq Scan on coll_pruning_multi3
- Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
-(7 rows)
+(5 rows)
-- pruning, with values provided for both keys
explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Append
- -> Seq Scan on coll_pruning_multi1
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-> Seq Scan on coll_pruning_multi2
Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
- -> Seq Scan on coll_pruning_multi3
- Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
-(7 rows)
+(3 rows)
--
-- LIKE operators don't prune
@@ -1343,3 +1331,188 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d2b4561530..ad5177715c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -159,9 +159,7 @@ explain (costs off) select * from boolpart where a is not unknown;
--
-- pruning for partitioned table appearing inside a sub-query
--
-
--- pruning won't work for mc3p, because the leading key (a) is compared to a
--- Param, which turns off the static pruning
+-- pruning won't work for mc3p, because some keys are Params
explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
-- pruning should work fine, because values for a prefix of keys (a, b) are
@@ -239,3 +237,40 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6d8a44cd9e..aa2ec281c4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1587,6 +1588,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1599,6 +1601,10 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
@@ -1752,6 +1758,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
--
2.11.0
v50-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchtext/plain; charset=UTF-8; name=v50-0005-Add-only-unpruned-partitioned-child-rels-to-part.patchDownload
From c544829f28e1192d904622dfcd98b4adf8fedb97 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Wed, 13 Sep 2017 18:24:55 +0900
Subject: [PATCH v50 5/5] Add only unpruned partitioned child rels to
partitioned_rels
Planner nodes (Merge)Append, ModifyTable each contain a
partitioned_rels list that records the RT indexes of partitioned
tables affected by that node. Currently, they record the indexes
of *all* partitioned tables in a given partition tree, because the
list is constructed during an earlier planning phase when it's not
known which of those tables will actually be affected by the plan.
Instead, construct such a list just before the Path for such a
plan is built, by which time, that should be known.
That means we no longer need the PartitionedChildRelInfo node and
some relevant code in prepunion.c.
---
src/backend/nodes/copyfuncs.c | 18 -------
src/backend/nodes/equalfuncs.c | 13 -----
src/backend/nodes/outfuncs.c | 16 +-----
src/backend/optimizer/path/allpaths.c | 99 ++++++++++++++++++++--------------
src/backend/optimizer/plan/planner.c | 99 ++++++++++++----------------------
src/backend/optimizer/prep/prepunion.c | 47 +++-------------
src/backend/optimizer/util/relnode.c | 3 ++
src/include/nodes/nodes.h | 1 -
src/include/nodes/relation.h | 30 +++--------
src/include/optimizer/planner.h | 5 --
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 111 insertions(+), 221 deletions(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 450c64d6fc..b0fa556f71 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2295,21 +2295,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5115,9 +5100,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 45ceba2830..28eecbbf08 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3214,9 +3204,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index efd0a71a2c..e6793b4716 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2271,7 +2271,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2296,6 +2295,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2345,6 +2345,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2570,16 +2571,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -4127,9 +4118,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd89c7cfee..c36a254ed6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -884,6 +884,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
* If the partitioned relation has any baserestrictinfo quals then we
* attempt to use these quals to prune away partitions that cannot
* possibly contain any tuples matching these quals. In this case we'll
@@ -1337,6 +1348,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1347,7 +1364,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1374,49 +1390,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1435,9 +1457,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 15c8d34c70..008492bad5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -616,7 +616,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -631,6 +630,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1191,12 +1191,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1268,10 +1268,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1503,6 +1505,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1603,6 +1614,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1610,7 +1636,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
0,
subpaths,
@@ -6145,65 +6171,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 5236ab378e..67e47887fc 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index a068a0090e..b9aa7486ba 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -158,6 +158,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -572,6 +573,7 @@ build_join_rel(PlannerInfo *root,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -742,6 +744,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4df979e9eb..1ec8030d4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -265,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d4bcacbd4f..4dc4cc4547 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -254,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -320,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -539,6 +540,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -674,6 +678,7 @@ typedef struct RelOptInfo
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2129,27 +2134,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 07a3bc0627..c090396e13 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,9 +59,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index aa2ec281c4..adde8eaee9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1611,7 +1611,6 @@ PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
--
2.11.0
v50-0001-Add-partsupfunc-to-PartitionSchemeData.patchtext/plain; charset=UTF-8; name=v50-0001-Add-partsupfunc-to-PartitionSchemeData.patchDownload
From 9ef64c7c5b45753abb33fb8181c8cc39e16b4557 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Fri, 9 Feb 2018 09:58:50 +0900
Subject: [PATCH v50 1/5] Add partsupfunc to PartitionSchemeData
partsupfunc is merely a cache of the value in PartitionKey to avoid
having to fetch it from the relcache at arbitrary points within the
planner. It's needed to compare a matched operator clause's constant
argument against partition bounds when performing partition pruning.
---
src/backend/optimizer/util/plancat.c | 24 ++++++++++++++++++++++--
src/include/nodes/relation.h | 4 ++++
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8a6baa7bea..b46b33d4f7 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1881,7 +1881,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1899,7 +1900,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1917,6 +1918,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1951,6 +1965,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a2dde70de5..83b03b41e4 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -356,6 +357,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
--
2.11.0
v50-0002-Expose-PartitionBoundInfoData-to-rest-of-the-bac.patchtext/plain; charset=UTF-8; name=v50-0002-Expose-PartitionBoundInfoData-to-rest-of-the-bac.patchDownload
From 7fe833caf427cb0a9ea4992726307190558506f5 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Thu, 5 Apr 2018 17:05:22 +0900
Subject: [PATCH v50 2/5] Expose PartitionBoundInfoData to rest of the backend
An upcoming planner patch wants to know if any of the partitions in
the partition descriptor is the default partition, which is currently
hidden inside PartitionBoundInfoData.
---
src/backend/catalog/partition.c | 54 --------------------------------------
src/include/catalog/partition.h | 57 ++++++++++++++++++++++++++++++++++++++---
2 files changed, 54 insertions(+), 57 deletions(-)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..17b2716c66 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -56,60 +56,6 @@
#include "utils/syscache.h"
/*
- * Information about bounds of a partitioned relation
- *
- * A list partition datum that is known to be NULL is never put into the
- * datums array. Instead, it is tracked using the null_index field.
- *
- * In the case of range partitioning, ndatums will typically be far less than
- * 2 * nparts, because a partition's upper bound and the next partition's lower
- * bound are the same in most common cases, and we only store one of them (the
- * upper bound). In case of hash partitioning, ndatums will be same as the
- * number of partitions.
- *
- * For range and list partitioned tables, datums is an array of datum-tuples
- * with key->partnatts datums each. For hash partitioned tables, it is an array
- * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
- * given partition.
- *
- * The datums in datums array are arranged in increasing order as defined by
- * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
- * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
- * respectively. For range and list partitions this simply means that the
- * datums in the datums array are arranged in increasing order as defined by
- * the partition key's operator classes and collations.
- *
- * In the case of list partitioning, the indexes array stores one entry for
- * every datum, which is the index of the partition that accepts a given datum.
- * In case of range partitioning, it stores one entry per distinct range
- * datum, which is the index of the partition for which a given datum
- * is an upper bound. In the case of hash partitioning, the number of the
- * entries in the indexes array is same as the greatest modulus amongst all
- * partitions. For a given partition key datum-tuple, the index of the
- * partition which would accept that datum-tuple would be given by the entry
- * pointed by remainder produced when hash value of the datum-tuple is divided
- * by the greatest modulus.
- */
-
-typedef struct PartitionBoundInfoData
-{
- char strategy; /* hash, list or range? */
- int ndatums; /* Length of the datums following array */
- Datum **datums;
- PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
- * NULL for hash and list partitioned
- * tables */
- int *indexes; /* Partition indexes */
- int null_index; /* Index of the null-accepting partition; -1
- * if there isn't one */
- int default_index; /* Index of the default partition; -1 if there
- * isn't one */
-} PartitionBoundInfoData;
-
-#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
-#define partition_bound_has_default(bi) ((bi)->default_index != -1)
-
-/*
* When qsort'ing partition bounds after reading from the catalog, each bound
* is represented with one of the following structs.
*/
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..62beee68b6 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -23,13 +23,64 @@
#define HASH_PARTITION_SEED UINT64CONST(0x7A5B22367996DCFD)
/*
- * PartitionBoundInfo encapsulates a set of partition bounds. It is usually
- * associated with partitioned tables as part of its partition descriptor.
+ * PartitionBoundInfoData encapsulates a set of partition bounds. It is
+ * usually associated with partitioned tables as part of its partition
+ * descriptor, but may also be used to represent a virtual partitioned
+ * table such as a partitioned joinrel within the planner.
*
- * The internal structure is opaque outside partition.c.
+ * A list partition datum that is known to be NULL is never put into the
+ * datums array. Instead, it is tracked using the null_index field.
+ *
+ * In the case of range partitioning, ndatums will typically be far less than
+ * 2 * nparts, because a partition's upper bound and the next partition's lower
+ * bound are the same in most common cases, and we only store one of them (the
+ * upper bound). In case of hash partitioning, ndatums will be same as the
+ * number of partitions.
+ *
+ * For range and list partitioned tables, datums is an array of datum-tuples
+ * with key->partnatts datums each. For hash partitioned tables, it is an array
+ * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
+ * given partition.
+ *
+ * The datums in datums array are arranged in increasing order as defined by
+ * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
+ * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
+ * respectively. For range and list partitions this simply means that the
+ * datums in the datums array are arranged in increasing order as defined by
+ * the partition key's operator classes and collations.
+ *
+ * In the case of list partitioning, the indexes array stores one entry for
+ * every datum, which is the index of the partition that accepts a given datum.
+ * In case of range partitioning, it stores one entry per distinct range
+ * datum, which is the index of the partition for which a given datum
+ * is an upper bound. In the case of hash partitioning, the number of the
+ * entries in the indexes array is same as the greatest modulus amongst all
+ * partitions. For a given partition key datum-tuple, the index of the
+ * partition which would accept that datum-tuple would be given by the entry
+ * pointed by remainder produced when hash value of the datum-tuple is divided
+ * by the greatest modulus.
*/
+
+typedef struct PartitionBoundInfoData
+{
+ char strategy; /* hash, list or range? */
+ int ndatums; /* Length of the datums following array */
+ Datum **datums;
+ PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
+ * NULL for hash and list partitioned
+ * tables */
+ int *indexes; /* Partition indexes */
+ int null_index; /* Index of the null-accepting partition; -1
+ * if there isn't one */
+ int default_index; /* Index of the default partition; -1 if there
+ * isn't one */
+} PartitionBoundInfoData;
+
typedef struct PartitionBoundInfoData *PartitionBoundInfo;
+#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
+#define partition_bound_has_default(bi) ((bi)->default_index != -1)
+
/*
* Information about partitions of a partitioned table.
*/
--
2.11.0
Amit Langote wrote:
1. Still not sure about RelOptInfo->has_default_part. This flag is
only looked at in generate_partition_pruning_steps. The RelOptInfo and
the boundinfo is available to look at, it's just that the
partition_bound_has_default macro is defined in partition.c rather
than partition.h.Hmm, it might not be such a bad idea to bring out the
PartitionBoundInfoData into partition.h. If we do that, we won't need the
has_default_part that the patch adds to RelOptInfo.In the Attached v50 set, 0002 does that.
After looking at this for a moment, I again come to the conclusion that
the overall layout of partitioning code and definitions is terrible.
But we already know that, and there's a patch in commitfest to improve
things. So my intention right now is to hold my nose and get this
pushed; we'll fix it afterwards.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I seems pretty clear that putting get_matching_partitions() in
catalog/partition.c is totally the wrong thing; it belongs wholly in
partprune. I think the reason you put it there is that it requires
access to a lot of internals that are static in partition.c. In the
attached not yet cleaned version of the patch, I have moved a whole lot
of what you added to partition.c to partprune.c; and for the functions
and struct declarations that were required to make it work, I created
catalog/partition_internal.h.
I changed a lot of code also, but cosmetic changes only.
I'll clean this up a bit more now, and try to commit shortly (or early
tomorrow); wanted to share current status now in case I have to rush
out.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
fastprune.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..d5e91b111d 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -24,6 +24,7 @@
#include "catalog/indexing.h"
#include "catalog/objectaddress.h"
#include "catalog/partition.h"
+#include "catalog/partition_internal.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
@@ -55,89 +56,6 @@
#include "utils/ruleutils.h"
#include "utils/syscache.h"
-/*
- * Information about bounds of a partitioned relation
- *
- * A list partition datum that is known to be NULL is never put into the
- * datums array. Instead, it is tracked using the null_index field.
- *
- * In the case of range partitioning, ndatums will typically be far less than
- * 2 * nparts, because a partition's upper bound and the next partition's lower
- * bound are the same in most common cases, and we only store one of them (the
- * upper bound). In case of hash partitioning, ndatums will be same as the
- * number of partitions.
- *
- * For range and list partitioned tables, datums is an array of datum-tuples
- * with key->partnatts datums each. For hash partitioned tables, it is an array
- * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
- * given partition.
- *
- * The datums in datums array are arranged in increasing order as defined by
- * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
- * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
- * respectively. For range and list partitions this simply means that the
- * datums in the datums array are arranged in increasing order as defined by
- * the partition key's operator classes and collations.
- *
- * In the case of list partitioning, the indexes array stores one entry for
- * every datum, which is the index of the partition that accepts a given datum.
- * In case of range partitioning, it stores one entry per distinct range
- * datum, which is the index of the partition for which a given datum
- * is an upper bound. In the case of hash partitioning, the number of the
- * entries in the indexes array is same as the greatest modulus amongst all
- * partitions. For a given partition key datum-tuple, the index of the
- * partition which would accept that datum-tuple would be given by the entry
- * pointed by remainder produced when hash value of the datum-tuple is divided
- * by the greatest modulus.
- */
-
-typedef struct PartitionBoundInfoData
-{
- char strategy; /* hash, list or range? */
- int ndatums; /* Length of the datums following array */
- Datum **datums;
- PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
- * NULL for hash and list partitioned
- * tables */
- int *indexes; /* Partition indexes */
- int null_index; /* Index of the null-accepting partition; -1
- * if there isn't one */
- int default_index; /* Index of the default partition; -1 if there
- * isn't one */
-} PartitionBoundInfoData;
-
-#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
-#define partition_bound_has_default(bi) ((bi)->default_index != -1)
-
-/*
- * When qsort'ing partition bounds after reading from the catalog, each bound
- * is represented with one of the following structs.
- */
-
-/* One bound of a hash partition */
-typedef struct PartitionHashBound
-{
- int modulus;
- int remainder;
- int index;
-} PartitionHashBound;
-
-/* One value coming from some (index'th) list partition */
-typedef struct PartitionListValue
-{
- int index;
- Datum value;
-} PartitionListValue;
-
-/* One bound of a range partition */
-typedef struct PartitionRangeBound
-{
- int index;
- Datum *datums; /* range bound datums */
- PartitionRangeDatumKind *kind; /* the kind of each datum */
- bool lower; /* this is the lower (vs upper) bound */
-} PartitionRangeBound;
-
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -173,29 +91,9 @@ static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
Oid *partcollation, Datum *datums1,
PartitionRangeDatumKind *kind1, bool lower1,
PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
- Oid *partcollation,
- Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums, int n_tuple_datums);
-
-static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
- PartitionBoundInfo boundinfo,
- Datum value, bool *is_equal);
-static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
- Oid *partcollation,
- PartitionBoundInfo boundinfo,
- PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
- Oid *partcollation,
- PartitionBoundInfo boundinfo,
- int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
- int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
-static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
- Datum *values, bool *isnull);
+
/*
* RelationBuildPartitionDesc
@@ -765,13 +663,13 @@ partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
if (b1->strategy == PARTITION_STRATEGY_HASH)
{
- int greatest_modulus = get_greatest_modulus(b1);
+ int greatest_modulus = get_hash_partition_greatest_modulus(b1);
/*
* If two hash partitioned tables have different greatest moduli,
* their partition schemes don't match.
*/
- if (greatest_modulus != get_greatest_modulus(b2))
+ if (greatest_modulus != get_hash_partition_greatest_modulus(b2))
return false;
/*
@@ -1029,7 +927,7 @@ check_new_partition_bound(char *relname, Relation parent,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("every hash partition modulus must be a factor of the next larger modulus")));
- greatest_modulus = get_greatest_modulus(boundinfo);
+ greatest_modulus = get_hash_partition_greatest_modulus(boundinfo);
remainder = spec->remainder;
/*
@@ -1620,7 +1518,6 @@ get_partition_qual_relid(Oid relid)
return result;
}
-/* Module-local functions */
/*
* get_partition_operator
@@ -2637,7 +2534,7 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
case PARTITION_STRATEGY_HASH:
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
- int greatest_modulus = get_greatest_modulus(boundinfo);
+ int greatest_modulus = get_hash_partition_greatest_modulus(boundinfo);
uint64 rowHash = compute_hash_value(key->partnatts,
key->partsupfunc,
values, isnull);
@@ -2971,7 +2868,7 @@ partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
* of attributes resp.
*
*/
-static int32
+int32
partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
@@ -3005,7 +2902,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* *is_equal is set to true if the bound datum at the returned index is equal
* to the input value.
*/
-static int
+int
partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
@@ -3048,7 +2945,7 @@ partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
* *is_equal is set to true if the range bound at the returned index is equal
* to the input range bound
*/
-static int
+int
partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
Oid *partcollation,
PartitionBoundInfo boundinfo,
@@ -3093,7 +2990,7 @@ partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
* *is_equal is set to true if the range bound at the returned index is equal
* to the input tuple.
*/
-static int
+int
partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
@@ -3136,7 +3033,7 @@ partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
* less than or equal to the given (modulus, remainder) pair or -1 if
* all of them are greater
*/
-static int
+int
partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
@@ -3294,7 +3191,7 @@ get_partition_bound_num_indexes(PartitionBoundInfo bound)
* The number of the entries in the indexes array is same as the
* greatest modulus.
*/
- num_indexes = get_greatest_modulus(bound);
+ num_indexes = get_hash_partition_greatest_modulus(bound);
break;
case PARTITION_STRATEGY_LIST:
@@ -3315,14 +3212,14 @@ get_partition_bound_num_indexes(PartitionBoundInfo bound)
}
/*
- * get_greatest_modulus
+ * get_hash_partition_greatest_modulus
*
* Returns the greatest modulus of the hash partition bound. The greatest
* modulus will be at the end of the datums array because hash partitions are
* arranged in the ascending order of their modulus and remainders.
*/
-static int
-get_greatest_modulus(PartitionBoundInfo bound)
+int
+get_hash_partition_greatest_modulus(PartitionBoundInfo bound)
{
Assert(bound && bound->strategy == PARTITION_STRATEGY_HASH);
Assert(bound->datums && bound->ndatums > 0);
@@ -3336,7 +3233,7 @@ get_greatest_modulus(PartitionBoundInfo bound)
*
* Compute the hash value for given not null partition key values.
*/
-static uint64
+uint64
compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull)
{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c3efca3c45..b0fa556f71 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2136,6 +2136,37 @@ _copyOnConflictExpr(const OnConflictExpr *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -2264,21 +2295,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5059,6 +5075,12 @@ copyObjectImpl(const void *from)
case T_OnConflictExpr:
retval = _copyOnConflictExpr(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
@@ -5078,9 +5100,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 45ceba2830..28eecbbf08 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -904,16 +904,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3214,9 +3204,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 4157e7eb9a..c3f1789ce2 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2156,6 +2156,17 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2958,6 +2969,20 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c8d962670e..e6793b4716 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1710,6 +1710,28 @@ _outFromExpr(StringInfo str, const FromExpr *node)
}
static void
+_outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPOP");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_INT_FIELD(opstrategy);
+ WRITE_NODE_FIELD(exprs);
+ WRITE_NODE_FIELD(cmpfns);
+ WRITE_BITMAPSET_FIELD(nullkeys);
+}
+
+static void
+_outPartitionPruneStepCombine(StringInfo str, const PartitionPruneStepCombine *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPCOMBINE");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ WRITE_NODE_FIELD(source_stepids);
+}
+
+static void
_outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
{
WRITE_NODE_TYPE("ONCONFLICTEXPR");
@@ -2249,7 +2271,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2274,6 +2295,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2323,6 +2345,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2548,16 +2571,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -3958,6 +3971,12 @@ outNode(StringInfo str, const void *obj)
case T_OnConflictExpr:
_outOnConflictExpr(str, obj);
break;
+ case T_PartitionPruneStepOp:
+ _outPartitionPruneStepOp(str, obj);
+ break;
+ case T_PartitionPruneStepCombine:
+ _outPartitionPruneStepCombine(str, obj);
+ break;
case T_Path:
_outPath(str, obj);
break;
@@ -4099,9 +4118,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4518fa0cdb..25874074a0 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1331,6 +1331,32 @@ _readOnConflictExpr(void)
READ_DONE();
}
+static PartitionPruneStepOp *
+_readPartitionPruneStepOp(void)
+{
+ READ_LOCALS(PartitionPruneStepOp);
+
+ READ_INT_FIELD(step.step_id);
+ READ_INT_FIELD(opstrategy);
+ READ_NODE_FIELD(exprs);
+ READ_NODE_FIELD(cmpfns);
+ READ_BITMAPSET_FIELD(nullkeys);
+
+ READ_DONE();
+}
+
+static PartitionPruneStepCombine *
+_readPartitionPruneStepCombine(void)
+{
+ READ_LOCALS(PartitionPruneStepCombine);
+
+ READ_INT_FIELD(step.step_id);
+ READ_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ READ_NODE_FIELD(source_stepids);
+
+ READ_DONE();
+}
+
/*
* Stuff from parsenodes.h.
*/
@@ -2596,6 +2622,10 @@ parseNodeString(void)
return_value = _readFromExpr();
else if (MATCH("ONCONFLICTEXPR", 14))
return_value = _readOnConflictExpr();
+ else if (MATCH("PARTITIONPRUNESTEPOP", 20))
+ return_value = _readPartitionPruneStepOp();
+ else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
+ return_value = _readPartitionPruneStepCombine();
else if (MATCH("RTE", 3))
return_value = _readRangeTblEntry();
else if (MATCH("RANGETBLFUNCTION", 16))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c4e4db15a6..c36a254ed6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -33,6 +33,7 @@
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/geqo.h"
+#include "optimizer/partprune.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
@@ -874,6 +875,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -881,6 +884,31 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will
+ * bubble up the indexes of partitioned relations that appear down in
+ * the tree, so that when we've created Paths for all the children,
+ * the root partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1128,6 +1156,17 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning &&
+ !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /*
+ * Partition pruning determined this partition cannot possibly
+ * contain any tuples matching the baserestrictinfo, so skip it.
+ */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
@@ -1309,6 +1348,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1319,7 +1364,6 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
-
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
@@ -1346,49 +1390,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1407,9 +1457,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 15c8d34c70..008492bad5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -616,7 +616,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -631,6 +630,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1191,12 +1191,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1268,10 +1268,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1503,6 +1505,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1603,6 +1614,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1610,7 +1636,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
0,
subpaths,
@@ -6145,65 +6171,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 5236ab378e..67e47887fc 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/Makefile b/src/backend/optimizer/util/Makefile
index c54d0a690d..aebd98875e 100644
--- a/src/backend/optimizer/util/Makefile
+++ b/src/backend/optimizer/util/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/optimizer/util
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
-OBJS = clauses.o joininfo.o orclauses.o pathnode.o placeholder.o \
+OBJS = clauses.o joininfo.o orclauses.o partprune.o pathnode.o placeholder.o \
plancat.o predtest.o relnode.o restrictinfo.o tlist.o var.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
new file mode 100644
index 0000000000..b0638d5aa6
--- /dev/null
+++ b/src/backend/optimizer/util/partprune.c
@@ -0,0 +1,2823 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Parses clauses attempting to match them up to partition keys of a
+ * given relation and generates a set of "pruning steps", which can be
+ * later "executed" either from the planner or the executor to determine
+ * the minimum set of partitions which match the given clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/optimizer/util/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/partition.h"
+#include "catalog/partition_internal.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/partprune.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ bool op_is_ne; /* is clause's original operator <> ? */
+ Expr *expr; /* expr the partition key is compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+ int op_strategy; /* cached info. */
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * gen_partprune_steps() initializes an instance of this struct, which is used
+ * throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+/* The result of performing one PartitionPruneStep */
+typedef struct PruneStepResult
+{
+ /*
+ * The offsets of bounds (in a table's boundinfo) whose partition is
+ * selected by the pruning step.
+ */
+ Bitmapset *bound_offsets;
+
+ bool scan_default; /* Scan the default partition? */
+ bool scan_null; /* Scan the partition for NULL values? */
+} PruneStepResult;
+
+
+static Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
+
+static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+static List *gen_partprune_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **rightop);
+static PartitionPruneStep *generate_pruning_steps_from_opexprs(
+ PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static PartitionPruneStep *gen_prune_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool op_is_ne,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static PartitionPruneStep *gen_prune_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+static PruneStepResult *get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = gen_partprune_steps(rel, clauses, &constfalse);
+ if (constfalse)
+ return NULL;
+
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ /* Actual pruning happens here. */
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ result = NULL;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+
+ return result;
+}
+
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+static Bitmapset *
+get_matching_partitions(PartitionPruneContext *context, List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **results,
+ *final_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. The
+ * result of applying all pruning steps is the value contained in the slot
+ * of the last pruning step.
+ */
+ results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know the offsets of all the datums whose corresponding
+ * partitions need to be in the result, including special null-accepting
+ * and default partitions. Collect the actual partition indexes now.
+ */
+ final_result = results[num_steps - 1];
+ Assert(final_result != NULL);
+ i = -1;
+ result = NULL;
+ while ((i = bms_next_member(final_result->bound_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a
+ * given range of data or for a given remainder, respectively.
+ * The default partition, if any, in case of range partitioning, will
+ * be added to the result, because the specified range still satisfies
+ * the query's conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed and if present. */
+ if (final_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partition_bound_accepts_nulls(context->boundinfo));
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (final_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(partition_bound_has_default(context->boundinfo));
+ result = bms_add_member(result, context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
+/*
+ * get_matching_hash_bounds
+ * Determine offset of the hash bound matching the specified values,
+ * considering that all the non-null values come from clauses containing
+ * a compatible hash equality operator and any keys that are null come
+ * from an IS NULL clause.
+ *
+ * Generally this function will return a single matching bound offset,
+ * although if a partition has not been setup for a given modulus then we may
+ * return no matches. If the number of clauses found don't cover the entire
+ * partition key, then we'll need to return all offsets.
+ *
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+ *
+ * 'values' contains Datums indexed by the partition key to use for pruning.
+ *
+ * 'nvalues', the number of Datums in the 'values' array.
+ *
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we got values for all keys.
+ */
+ if (nvalues + bms_num_members(nullkeys) == partnatts)
+ {
+ /*
+ * If there are any values, they must have come from clauses
+ * containing an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_hash_partition_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ result->bound_offsets =
+ bms_make_singleton(rowHash % greatest_modulus);
+ }
+ else
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+
+ /*
+ * There is neither a special hash null partition or the default hash
+ * partition.
+ */
+ result->scan_null = result->scan_default = false;
+
+ return result;
+}
+
+/*
+ * get_matching_list_bounds
+ * Determine the offsets of list bounds matching the specified value,
+ * according to the semantics of the given operator strategy
+ * 'opstrategy' if non-zero must be a btree strategy number.
+ *
+ * 'value' contains the value to use for pruning.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
+ *
+ * 'partsupfunc' contains the list partitioning comparison function to be used
+ * to perform partition_list_bsearch
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ result->scan_null = result->scan_default = false;
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ result->scan_null = true;
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default if any.
+ */
+ if (nvalues == 0)
+ {
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /* Special case handling of values coming from a <> operator clause. */
+ if (opstrategy == InvalidStrategy)
+ {
+ /*
+ * First match to all bounds. We'll remove any matching datums below.
+ */
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+
+ off = partition_list_bsearch(partsupfunc, partcollation, boundinfo,
+ value, &is_equal);
+ if (off >= 0 && is_equal)
+ {
+
+ /* We have a match. Remove from the result. */
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_del_member(result->bound_offsets,
+ off);
+ }
+
+ /* Always include the default partition if any. */
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ return result;
+ }
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_make_singleton(off);
+ }
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partitions satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /*
+ * There better be the same number of expressions and compare functions.
+ */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition look-up key that will be used by one of
+ * the get_matching_*_bounds functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function
+ * than the one cached in the PartitionKey, we'll need to
+ * look up the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ fmgr_info_copy(&partsupfunc[keyno],
+ &context->partsupfunc[keyno],
+ CurrentMemoryContext);
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_matching_hash_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ break;
+ }
+
+ return NULL;
+}
+
+/*
+ * get_matching_range_datums
+ * Determine the offsets of range bounds matching the specified values,
+ * according to the semantics of the given operator strategy
+ *
+ * Each datum whose offset is in result is to be treated as the upper bound of
+ * the partition that will contain the desired values.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'opstrategy' if non-zero must be a btree strategy number.
+ *
+ * 'values' contains Datums indexed by the partition key to use for pruning.
+ *
+ * 'nvalues', number of Datums in 'values' array. Must be <= context->partnatts.
+ *
+ * 'partsupfunc' contains the range partitioning comparison functions to be
+ * used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * using.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(nvalues <= partnatts);
+
+ result->scan_null = result->scan_default = false;
+
+ /*
+ * If there are no datums to compare keys with, or if we got an IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+ }
+
+ /*
+ * If the query does not constrain all key columns, we'll need to scan the
+ * the default partition, if any.
+ */
+ if (nvalues < partnatts)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ /* Look for the smallest bound that is = look-up value. */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partition. */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ */
+
+ /*
+ * First find greatest bound that's smaller than the
+ * look-up value.
+ */
+ while (off >= 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+ /*
+ * We can treat 'off' as the offset of the smallest bound
+ * to be included in the result, if we know it is the
+ * upper bound of the partition in which the look-up value
+ * could possibly exist. One case it couldn't is if the
+ * bound, or precisely the matched portion of its prefix,
+ * is not inclusive.
+ */
+ if (boundinfo->kind[off][nvalues] ==
+ PARTITION_RANGE_DATUM_MINVALUE)
+ off++;
+
+ minoff = off;
+
+ /*
+ * Now find smallest bound that's greater than the look-up
+ * value.
+ */
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * off + 1, then would be the offset of the greatest bound
+ * to be included in the result.
+ */
+ maxoff = off + 1;
+ }
+
+ /*
+ * Skip if minoff/maxoff are actually the upper bound of a
+ * un-assigned portion of values.
+ */
+ if (partindices[minoff] < 0 && minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+ }
+ else if (off >= 0) /* !is_equal */
+ {
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * only partition that may contain the look-up value.
+ */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ return result;
+ }
+ /*
+ * off < 0, meaning the look-up value is smaller that all bounds,
+ * so only the default partition, if any, qualifies.
+ */
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ /*
+ * Look for the smallest bound that is > or >= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the look-up value, so include
+ * all of them in the result.
+ */
+ minoff = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since the look-up value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ *
+ * Based on whether the look-up values are inclusive or
+ * not, we must either include the indexes of all such
+ * bounds in the result (that is, set minoff to the index
+ * of smallest such bound) or find the smallest one that's
+ * greater than the look-up values and set minoff to that.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ minoff = inclusive ? off : off + 1;
+ }
+ /*
+ * Look-up value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * smallest partition that may contain the look-up value.
+ */
+ else
+ minoff = off + 1;
+ }
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ /*
+ * Look for the greatest bound that is < or <= look-up value
+ * and set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the key, so we could only
+ * expect to find the look-up key in the default partition.
+ */
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ /*
+ * See the comment above.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ maxoff = inclusive ? off + 1: off;
+ }
+ /*
+ * The look-up value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest
+ * bound that is <= look-up value, so add off + 1 to the
+ * result instead as the offset of the upper bound of the
+ * greatest partition that may contain look-up value. If
+ * the look-up value had exactly matched the bound, but it
+ * isn't inclusive, no need add the adjacent partition.
+ */
+ else if (!is_equal || inclusive)
+ maxoff = off + 1;
+ else
+ maxoff = off;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition, as having an
+ * infinite bound means the partition in question catches any values
+ * that would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or
+ * or not to scan the default partition based whether the datum that
+ * will become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (minoff <= maxoff)
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+}
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids using the specified
+ * combination method
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+ bool firststep;
+
+ /*
+ * A combine step without any source steps is an indication to not perform
+ * any partition pruning, we just return all partitions.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (list_length(cstep->source_stepids) == 0)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->bound_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+
+ switch (cstep->combineOp)
+ {
+ case PARTPRUNE_COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result, which is
+ * confirmed by the fact that cstep's step_id is greater than
+ * step_id and the fact that results of the individual steps
+ * are evaluated in sequence of their step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ /* Record any additional datum indexes from this step */
+ result->bound_offsets = bms_add_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
+
+ case PARTPRUNE_COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->bound_offsets = step_result->bound_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /* Record datum indexes common to both steps */
+ result->bound_offsets =
+ bms_int_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+
+ return result;
+}
+
+
+/*
+ * gen_partprune_steps
+ * Process 'clauses' (a rel's baserestrictinfo list of clauses) and return
+ * a list of "partition pruning steps"
+ *
+ * If any of the clauses in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+static List *
+gen_partprune_steps(RelOptInfo *rel, List *clauses, bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (partition_bound_has_default(rel->boundinfo) &&
+ rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ *constfalse = false;
+ gen_partprune_steps_internal(rel, &context, clauses, constfalse);
+
+ return context.steps;
+}
+
+/*
+ * gen_partprune_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each argument, and
+ * return a PartitionPruneStepCombine of their results.
+ *
+ * The generated steps are added to the context's steps list. Each step is
+ * assigned a unique step identifier, across recursive calls.
+ *
+ * If we find clauses that are mutually contradictory, or a pseudoconstant
+ * clause that contains false, we set *constfalse to true and return NIL (no
+ * pruning steps). Caller should consider all partitions as pruned in that
+ * case.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+gen_partprune_steps_internal(RelOptInfo *rel, GeneratePruningStepsContext *context,
+ List *clauses, bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS];
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ IsA(rinfo->clause, Const) &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * independently, collect their step IDs to be stored in the
+ * combine step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ arg_constfalse = false;
+ argsteps =
+ gen_partprune_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps either means that arg_constfalse is true
+ * or the arg didn't contain a clause matching this
+ * partition key.
+ *
+ * In case of the latter, we cannot prune using such
+ * an arg. To indicate that to the pruning code, we
+ * must construct a dummy PartitionPruneStepCombine
+ * whose source_stepids is set to an empty List.
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ /* Just ignore this argument. */
+ if (arg_constfalse)
+ continue;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = gen_prune_step_combine(context, NIL,
+ PARTPRUNE_COMBINE_UNION);
+ arg_stepids = lappend_int(arg_stepids, orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ if (arg_stepids != NIL)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_combine(context, arg_stepids,
+ PARTPRUNE_COMBINE_UNION);
+ result = lappend(result, step);
+ }
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ *constfalse = false;
+ argsteps = gen_partprune_steps_internal(rel, context, args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach (lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ if (arg_stepids)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_combine(context, arg_stepids,
+ PARTPRUNE_COMBINE_INTERSECT);
+ result = lappend(result, step);
+ }
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which if it's a Boolean clause,
+ * will be handled in match_clause_to_partition_key(). We
+ * currently don't perform any pruning for more complex NOT
+ * clauses.
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool unsupported_clause = false,
+ key_is_null = false,
+ key_is_not_null = false;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &key_is_null,
+ &key_is_not_null,
+ &pc, &clause_steps))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ Assert(pc != NULL);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (key_is_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else if (key_is_not_null)
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ else
+ Assert(false);
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ unsupported_clause = true;
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
+ }
+ }
+
+ /*
+ * If generate_opsteps is set to false it means no OpExprs were directly
+ * present in the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL, if
+ * any. To prune hash partitions, we must have found IS NULL clauses
+ * for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_op(context, 0, false, NIL, NIL,
+ nullkeys);
+ result = lappend(result, step);
+ }
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_op(context, 0, false, NIL, NIL, NULL);
+ result = lappend(result, step);
+ }
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = generate_pruning_steps_from_opexprs(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an INTERSECT combine step, if more than one.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ if (step_ids != NIL)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_combine(context, step_ids,
+ PARTPRUNE_COMBINE_INTERSECT);
+ result = lappend(result, step);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Attempt to match the given 'clause' with the specified partition key.
+ *
+ * Return value is:
+ * * PARTCLAUSE_NOMATCH if the clause doesn't match this partition key (but
+ * caller should keep trying, because it might match a subsequent key).
+ * Output arguments: none set.
+ *
+ * * PARTCLAUSE_MATCH_CLAUSE if there is a match.
+ * Output arguments: *pc is set to PartClauseInfo constructed for the
+ * matched clause.
+ *
+ * * PARTCLAUSE_MATCH_NULLNESS if there is a match, and the matched clause was
+ * either a "a IS NULL" or "a IS NOT NULL" clause.
+ * Output arguments: *key_is_null is set in the former case, *key_is_not_null
+ * if the latter case.
+ *
+ * * PARTCLAUSE_MATCH_STEPS if there is a match.
+ * Output arguments: *clause_steps is set to a list of PartitionPruneStep
+ * generated for the clause.
+ *
+ * * PARTCLAUSE_MATCH_CONTRADICT if the clause is self-contradictory. This can
+ * only happen if it's a BoolExpr whose arguments are self-contradictory.
+ * Output arguments: none set.
+ *
+ * * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all
+ * due to one of its properties, such as argument volatility, even if it may
+ * have been matched with a key.
+ * Output arguments: none set.
+#if 0
+ *
+ * One of PARTCLAUSE_MATCH_* enum values if the clause is successfully
+ * matched to the partition key. If it is PARTCLAUSE_MATCH_CONTRADICT, then
+ * this means the clause is self-contradictory (which can happen only if it's
+ * a BoolExpr whose arguments are self-contradictory)
+ *
+ * PARTCLAUSE_NOMATCH if the clause doesn't match *this* partition key but
+ * the caller should continue trying because it may match a subsequent key
+ *
+ * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all,
+ * even if it may have been matched with a key, due to one of its properties,
+ * such as volatility of the arguments
+ *
+ * Based on the returned enum value, different output arguments are set as
+ * follows:
+ *
+ * PARTCLAUSE_UNSUPPORTED or
+ * PARTCLAUSE_NOMATCH or
+ * PARTCLAUSE_MATCH_CONTRADICT: None set (caller shouldn't rely on any of
+ * them being set)
+ *
+ * PARTCLAUSE_MATCH_CLAUSE: *pc set to PartClauseInfo constructed for the
+ * matched clause
+ *
+ * PARTCLAUSE_MATCH_NULLNESS: either *key_is_null or *key_is_not_null set
+ * based on whether the matched clause was a IS NULL or IS NOT NULL clause,
+ * respectively
+ *
+ * PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
+ * step(s)" generated for the clause due to it being a BoolExpr or a
+ * ScalarArrayOpExpr that's turned into one
+#endif
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *key_is_null, bool *key_is_not_null,
+ PartClauseInfo **pc, List **clause_steps)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ PartClauseInfo *partclause;
+
+ partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
+ partclause->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ partclause->opno = BooleanEqualOperator;
+ partclause->op_is_ne = false;
+ partclause->expr = expr;
+ /* We know that expr is of Boolean type. */
+ partclause->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+ partclause->op_strategy = InvalidStrategy;
+
+ *pc = partclause;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+ bool is_opne_listp = false;
+ PartClauseInfo *partclause;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_opne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_opne_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, BTORDER_PROC);
+ break;
+
+ case PARTITION_STRATEGY_HASH:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ exprtype, exprtype, HASHEXTENDED_PROC);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
+ partclause->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ partclause->op_is_ne = false;
+ partclause->op_strategy = InvalidStrategy;
+
+ if (is_opne_listp)
+ {
+ Assert(OidIsValid(negator));
+ partclause->opno = negator;
+ partclause->op_is_ne = true;
+ /*
+ * We already know the strategy in this case, so may as well set
+ * it rather than having to look it up later.
+ */
+ partclause->op_strategy = BTEqualStrategyNumber;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ partclause->opno = commutator;
+ else
+ partclause->opno = opclause->opno;
+
+ partclause->expr = expr;
+ partclause->cmpfn = cmpfn;
+
+ *pc = partclause;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element.
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = (Const *) lsecond(saop->args);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ /* Only consider non-null values. */
+ if (!elem_nulls[i])
+ {
+ Const *elem_expr = makeConst(ARR_ELEMTYPE(arrval),
+ -1, arr->constcollid,
+ elemlen,
+ elem_values[i],
+ false, elembyval);
+
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = (Expr *) make_opclause(saop_op, BOOLOID,
+ false,
+ leftop, rightop,
+ InvalidOid,
+ saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse = false;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ gen_partprune_steps_internal(rel, context, list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse = false;
+
+ *clause_steps =
+ gen_partprune_steps_internal(rel, context, elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ if (nulltest->nulltesttype == IS_NULL)
+ *key_is_null = true;
+ else
+ *key_is_not_null = true;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *rightop to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *rightop set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **rightop)
+{
+ Expr *leftop;
+
+ *rightop = NULL;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *rightop = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+ else
+ {
+ leftop = not_clause((Node *) clause)
+ ? get_notclausearg(clause)
+ : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *rightop = (Expr *) makeBoolConst(false, false);
+
+ if (*rightop)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * generate_pruning_steps_from_opexprs
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+generate_pruning_steps_from_opexprs(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses,
+ Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have the necessary
+ * equality clause, there should be an IS NULL clause, otherwise
+ * pruning is not possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't search any further.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a look-up key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called 'prefix'.
+ * By appending the clause's own expression to the 'prefix',
+ * we'll generate one step using the so generated vector and
+ * assign the current strategy to it. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which case,
+ * we must generate steps for various combinations of
+ * expressions of different keys, which get_steps_using_prefix
+ * takes care of for us.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided they're
+ * from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->op_is_ne,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column. This may
+ * not belong to the last partition key, but it is the
+ * clause belonging to the last partition key we found a
+ * clause for above.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * There might be multiple clauses which matched to that
+ * partition key; find the first such clause. While at it,
+ * add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and
+ * and assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys, which
+ * get_steps_using_prefix will take care of for us.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys we found IS NULL clauses
+ * for.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ false,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ if (opstep_ids != NIL)
+ return gen_prune_step_combine(context, opstep_ids,
+ PARTPRUNE_COMBINE_INTERSECT);
+ return NULL;
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_op(context,
+ step_opstrategy,
+ step_op_is_ne,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys);
+ return list_make1(step);
+ }
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_op_is_ne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the previous part keys.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_op_is_ne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one
+ * for each clause with cur_keyno, which is all clauses from here
+ * onward till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStep *step;
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ step = gen_prune_step_op(context,
+ step_opstrategy, step_op_is_ne,
+ step_exprs1, step_cmpfns1,
+ step_nullkeys);
+ result = lappend(result, step);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Generate a pruning step for a specific operator.
+ *
+ * The step is assigned a unique step identifier and added to context's 'steps'
+ * list.
+ */
+static PartitionPruneStep *
+gen_prune_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool op_is_ne,
+ List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+
+ /*
+ * For clauses that contain an <> operator, set opstrategy to
+ * InvalidStrategy to signal get_matching_list_bounds to do the
+ * right thing.
+ */
+ if (op_is_ne)
+ {
+ Assert(opstrategy == BTEqualStrategyNumber);
+ opstep->opstrategy = InvalidStrategy;
+ }
+ else
+ opstep->opstrategy = opstrategy;
+ Assert(list_length(exprs) == list_length(cmpfns));
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (PartitionPruneStep *) opstep;
+}
+
+/*
+ * Generate a pruning step for a combination of several other steps.
+ *
+ * The step is assigned a unique step identifier and added to context's
+ * 'steps' list.
+ */
+static PartitionPruneStep *
+gen_prune_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (PartitionPruneStep *) cstep;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Evaluate 'expr', set *value to the resulting Datum. Return true if
+ * evaluation was possible, otherwise false.
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
+
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8a6baa7bea..52e4cca49a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1869,6 +1878,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
@@ -1881,7 +1891,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1899,7 +1910,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1917,6 +1928,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1951,6 +1975,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..b9aa7486ba 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,11 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -567,9 +569,11 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -734,9 +738,13 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..0bcaa36165 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -23,13 +23,64 @@
#define HASH_PARTITION_SEED UINT64CONST(0x7A5B22367996DCFD)
/*
- * PartitionBoundInfo encapsulates a set of partition bounds. It is usually
- * associated with partitioned tables as part of its partition descriptor.
+ * PartitionBoundInfoData encapsulates a set of partition bounds. It is
+ * usually associated with partitioned tables as part of its partition
+ * descriptor, but may also be used to represent a virtual partitioned
+ * table such as a partitioned joinrel within the planner.
*
- * The internal structure is opaque outside partition.c.
+ * A list partition datum that is known to be NULL is never put into the
+ * datums array. Instead, it is tracked using the null_index field.
+ *
+ * In the case of range partitioning, ndatums will typically be far less than
+ * 2 * nparts, because a partition's upper bound and the next partition's lower
+ * bound are the same in most common cases, and we only store one of them (the
+ * upper bound). In case of hash partitioning, ndatums will be same as the
+ * number of partitions.
+ *
+ * For range and list partitioned tables, datums is an array of datum-tuples
+ * with key->partnatts datums each. For hash partitioned tables, it is an array
+ * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
+ * given partition.
+ *
+ * The datums in datums array are arranged in increasing order as defined by
+ * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
+ * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
+ * respectively. For range and list partitions this simply means that the
+ * datums in the datums array are arranged in increasing order as defined by
+ * the partition key's operator classes and collations.
+ *
+ * In the case of list partitioning, the indexes array stores one entry for
+ * every datum, which is the index of the partition that accepts a given datum.
+ * In case of range partitioning, it stores one entry per distinct range
+ * datum, which is the index of the partition for which a given datum
+ * is an upper bound. In the case of hash partitioning, the number of the
+ * entries in the indexes array is same as the greatest modulus amongst all
+ * partitions. For a given partition key datum-tuple, the index of the
+ * partition which would accept that datum-tuple would be given by the entry
+ * pointed by remainder produced when hash value of the datum-tuple is divided
+ * by the greatest modulus.
*/
+
+typedef struct PartitionBoundInfoData
+{
+ char strategy; /* hash, list or range? */
+ int ndatums; /* Length of the datums following array */
+ Datum **datums;
+ PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
+ * NULL for hash and list partitioned
+ * tables */
+ int *indexes; /* Partition indexes */
+ int null_index; /* Index of the null-accepting partition; -1
+ * if there isn't one */
+ int default_index; /* Index of the default partition; -1 if there
+ * isn't one */
+} PartitionBoundInfoData;
+
typedef struct PartitionBoundInfoData *PartitionBoundInfo;
+#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
+#define partition_bound_has_default(bi) ((bi)->default_index != -1)
+
/*
* Information about partitions of a partitioned table.
*/
@@ -42,6 +93,28 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
@@ -58,6 +131,8 @@ extern List *get_qual_from_partbound(Relation rel, Relation parent,
extern List *map_partition_varattnos(List *expr, int fromrel_varno,
Relation to_rel, Relation from_rel,
bool *found_whole_row);
+extern int get_hash_partition_greatest_modulus(PartitionBoundInfo bound);
+
extern List *RelationGetPartitionQual(Relation rel);
extern Expr *get_partition_qual_relid(Oid relid);
extern bool has_partition_attrs(Relation rel, Bitmapset *attnums,
@@ -70,7 +145,6 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
-/* For tuple routing */
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
diff --git a/src/include/catalog/partition_internal.h b/src/include/catalog/partition_internal.h
new file mode 100644
index 0000000000..b21d422e75
--- /dev/null
+++ b/src/include/catalog/partition_internal.h
@@ -0,0 +1,65 @@
+/*-------------------------------------------------------------------------
+ *
+ * partition_internal.h
+ *
+ * Copyright (c) 2007-2018, PostgreSQL Global Development Group
+ *
+ * src/include/catalog/partition_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTITION_INTERNAL_H
+#define PARTITION_INTERNAL_H
+
+/*
+ * When qsort'ing partition bounds after reading from the catalog, each bound
+ * is represented with one of the following structs.
+ */
+
+/* One bound of a hash partition */
+typedef struct PartitionHashBound
+{
+ int modulus;
+ int remainder;
+ int index;
+} PartitionHashBound;
+
+/* One value coming from some (index'th) list partition */
+typedef struct PartitionListValue
+{
+ int index;
+ Datum value;
+} PartitionListValue;
+
+/* One bound of a range partition */
+typedef struct PartitionRangeBound
+{
+ int index;
+ Datum *datums; /* range bound datums */
+ PartitionRangeDatumKind *kind; /* the kind of each datum */
+ bool lower; /* this is the lower (vs upper) bound */
+} PartitionRangeBound;
+
+
+extern int get_hash_partition_greatest_modulus(PartitionBoundInfo b);
+extern int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
+ PartitionBoundInfo boundinfo,
+ Datum value, bool *is_equal);
+extern int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
+ PartitionBoundInfo boundinfo,
+ PartitionRangeBound *probe, bool *is_equal);
+extern int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
+ PartitionBoundInfo boundinfo,
+ int nvalues, Datum *values, bool *is_equal);
+extern int partition_hash_bsearch(PartitionBoundInfo boundinfo,
+ int modulus, int remainder);
+extern uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
+extern int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
+ Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
+ Datum *tuple_datums, int n_tuple_datums);
+
+#endif /* PARTITION_INTERNAL_H */
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..0847df97ff 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -188,4 +188,7 @@ DATA(insert OID = 4104 ( 3580 box_inclusion_ops PGNSP PGUID ));
DATA(insert OID = 5000 ( 4000 box_ops PGNSP PGUID ));
DATA(insert OID = 5008 ( 4000 poly_ops PGNSP PGUID ));
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
#endif /* PG_OPFAMILY_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fce48026b6..1ec8030d4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -193,6 +193,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
@@ -262,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..7c4540b261 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,78 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+
+/*
+ * Node types to represent a partition pruning step.
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the lookup key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the lookup key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ PARTPRUNE_COMBINE_UNION,
+ PARTPRUNE_COMBINE_INTERSECT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a2dde70de5..4dc4cc4547 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -253,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -319,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -356,6 +358,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
@@ -531,8 +536,13 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
+ * has_default_part - Whether the table has a default partition
+ * partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -663,10 +673,12 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -2122,27 +2134,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
new file mode 100644
index 0000000000..027016b32c
--- /dev/null
+++ b/src/include/optimizer/partprune.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/optimizer/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "nodes/relation.h"
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 07a3bc0627..c090396e13 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,9 +59,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 5e57b9a465..b2b912ed5c 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1951,11 +1951,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..2d77b3edd4 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -1007,24 +1009,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1032,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1086,4 +1073,446 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(28 rows)
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(5 rows)
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(3 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..ad5177715c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -152,4 +152,125 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6d8a44cd9e..adde8eaee9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -830,6 +830,7 @@ GatherMergeState
GatherPath
GatherState
Gene
+GeneratePruningStepsContext
GenerationBlock
GenerationChunk
GenerationContext
@@ -1587,6 +1588,7 @@ ParsedText
ParsedWord
ParserSetupHook
ParserState
+PartClauseInfo
PartitionBoundInfo
PartitionBoundInfoData
PartitionBoundSpec
@@ -1599,13 +1601,16 @@ PartitionElem
PartitionHashBound
PartitionKey
PartitionListValue
+PartitionPruneContext
+PartitionPruneStep
+PartitionPruneStepCombine
+PartitionPruneStepOp
PartitionRangeBound
PartitionRangeDatum
PartitionRangeDatumKind
PartitionScheme
PartitionSpec
PartitionTupleRouting
-PartitionedChildRelInfo
PartitionwiseAggregateType
PasswordType
Path
@@ -1752,6 +1757,7 @@ ProjectionPath
ProtocolVersion
PrsStorage
PruneState
+PruneStepResult
PsqlScanCallbacks
PsqlScanQuoteType
PsqlScanResult
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, * parentrte already has the root partrel's updatedCols translated to match * the attribute ordering of parentrel. */ - if (!*part_cols_updated) - *part_cols_updated = + if (!root->partColsUpdated) + root->partColsUpdated = has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
Hmm, surely this should be |= to avoid resetting a value set in a
previous call to this function? In the previous coding it wasn't
necessary because it was a local variable ... (though, isn't it a bit
odd to have this in PlannerInfo? seems like it should be in
resultRelInfo, but then you already have it there so I suppose this one
does *more*)
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 6 April 2018 at 10:35, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I changed a lot of code also, but cosmetic changes only.
I'll clean this up a bit more now, and try to commit shortly (or early
tomorrow); wanted to share current status now in case I have to rush
out.
I made a complete pass over the patch you sent. I only noted down the
following few things:
1.
+ * off < 0, meaning the look-up value is smaller that all bounds,
that -> than
2. I guess this will be removed before commit.
+#if 0
<large section of comments>
+#endif
3. This comment seems like a strange thing to write just before
testing if the clause matches the partition key.
+ /* Clause does not match this partition key. */
+ if (equal(leftop, partkey))
+ *rightop = not_clause((Node *) clause)
+ ? (Expr *) makeBoolConst(false, false)
+ : (Expr *) makeBoolConst(true, false);
4. Comment needs removed.
+ * has_default_part - Whether the table has a default partition
The only other thing I noted on this pass is that we could get rid of:
+ /* go check the next clause. */
+ if (unsupported_clause)
+ break;
and just "continue" instead of "break" in all cases apart from case
PARTCLAUSE_UNSUPPORTED:
it would save a few lines and a single condition. What's there works,
but thought this might be better...
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 6 April 2018 at 12:02, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 6 April 2018 at 10:35, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
The only other thing I noted on this pass is that we could get rid of:+ /* go check the next clause. */ + if (unsupported_clause) + break;and just "continue" instead of "break" in all cases apart from case
PARTCLAUSE_UNSUPPORTED:
I should have said remove:
+ if (unsupported_clause)
The "break" would still be required.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi.
On 2018/04/06 7:35, Alvaro Herrera wrote:
I seems pretty clear that putting get_matching_partitions() in
catalog/partition.c is totally the wrong thing; it belongs wholly in
partprune. I think the reason you put it there is that it requires
access to a lot of internals that are static in partition.c. In the
attached not yet cleaned version of the patch, I have moved a whole lot
of what you added to partition.c to partprune.c; and for the functions
and struct declarations that were required to make it work, I created
catalog/partition_internal.h.
Yes, I really wanted for most of the new code that this patch adds to land
in the planner, especially after Robert's comments here:
/messages/by-id/CA+Tgmoabi-29Vs8H0xkjtYB=cU+GVCrNwPz7okpa3KsoLmdEUQ@mail.gmail.com
It would've been nice if we'd gotten the "reorganizing partitioning code"
thread resolved sooner.
I changed a lot of code also, but cosmetic changes only.
I'll clean this up a bit more now, and try to commit shortly (or early
tomorrow); wanted to share current status now in case I have to rush
out.
Some comments on the code reorganizing part of the patch:
* Did you intentionally not put PartitionBoundInfoData and its accessor
macros in partition_internal.h. partprune.c would not need to include
partition.h if we do that.
* Also, I wonder why you left PartitionPruneContext in partition.h. Isn't
it better taken out to partprune.h?
I have done that in the attached.
* Why isn't gen_partprune_steps() in partprune.h? I see only
prune_append_rel_partitions() exported out of partprune.c, but the runtime
patch needs gen_partprune_steps() to be called from createplan.c.
* I don't see get_matching_partitions() exported either. Runtime pruning
patch needs that too.
Maybe you've thought something about these two items though.
Thanks,
Amit
Attachments:
fastprune-delta-partition-struct-movement.patchtext/plain; charset=UTF-8; name=fastprune-delta-partition-struct-movement.patchDownload
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index d5e91b111d..f89c99f544 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -24,7 +24,6 @@
#include "catalog/indexing.h"
#include "catalog/objectaddress.h"
#include "catalog/partition.h"
-#include "catalog/partition_internal.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_inherits_fn.h"
diff --git a/src/backend/optimizer/util/partprune.c b/src/backend/optimizer/util/partprune.c
index b0638d5aa6..93553b5d13 100644
--- a/src/backend/optimizer/util/partprune.c
+++ b/src/backend/optimizer/util/partprune.c
@@ -19,8 +19,6 @@
#include "access/hash.h"
#include "access/nbtree.h"
-#include "catalog/partition.h"
-#include "catalog/partition_internal.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_opfamily.h"
#include "catalog/pg_type.h"
@@ -35,6 +33,7 @@
#include "parser/parse_coerce.h"
#include "parser/parsetree.h"
#include "rewrite/rewriteManip.h"
+#include "utils/array.h"
#include "utils/lsyscache.h"
/*
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index 0bcaa36165..1c17553917 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -13,7 +13,7 @@
#ifndef PARTITION_H
#define PARTITION_H
-#include "fmgr.h"
+#include "catalog/partition_internal.h"
#include "executor/tuptable.h"
#include "nodes/execnodes.h"
#include "parser/parse_node.h"
@@ -23,65 +23,6 @@
#define HASH_PARTITION_SEED UINT64CONST(0x7A5B22367996DCFD)
/*
- * PartitionBoundInfoData encapsulates a set of partition bounds. It is
- * usually associated with partitioned tables as part of its partition
- * descriptor, but may also be used to represent a virtual partitioned
- * table such as a partitioned joinrel within the planner.
- *
- * A list partition datum that is known to be NULL is never put into the
- * datums array. Instead, it is tracked using the null_index field.
- *
- * In the case of range partitioning, ndatums will typically be far less than
- * 2 * nparts, because a partition's upper bound and the next partition's lower
- * bound are the same in most common cases, and we only store one of them (the
- * upper bound). In case of hash partitioning, ndatums will be same as the
- * number of partitions.
- *
- * For range and list partitioned tables, datums is an array of datum-tuples
- * with key->partnatts datums each. For hash partitioned tables, it is an array
- * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
- * given partition.
- *
- * The datums in datums array are arranged in increasing order as defined by
- * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
- * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
- * respectively. For range and list partitions this simply means that the
- * datums in the datums array are arranged in increasing order as defined by
- * the partition key's operator classes and collations.
- *
- * In the case of list partitioning, the indexes array stores one entry for
- * every datum, which is the index of the partition that accepts a given datum.
- * In case of range partitioning, it stores one entry per distinct range
- * datum, which is the index of the partition for which a given datum
- * is an upper bound. In the case of hash partitioning, the number of the
- * entries in the indexes array is same as the greatest modulus amongst all
- * partitions. For a given partition key datum-tuple, the index of the
- * partition which would accept that datum-tuple would be given by the entry
- * pointed by remainder produced when hash value of the datum-tuple is divided
- * by the greatest modulus.
- */
-
-typedef struct PartitionBoundInfoData
-{
- char strategy; /* hash, list or range? */
- int ndatums; /* Length of the datums following array */
- Datum **datums;
- PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
- * NULL for hash and list partitioned
- * tables */
- int *indexes; /* Partition indexes */
- int null_index; /* Index of the null-accepting partition; -1
- * if there isn't one */
- int default_index; /* Index of the default partition; -1 if there
- * isn't one */
-} PartitionBoundInfoData;
-
-typedef struct PartitionBoundInfoData *PartitionBoundInfo;
-
-#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
-#define partition_bound_has_default(bi) ((bi)->default_index != -1)
-
-/*
* Information about partitions of a partitioned table.
*/
typedef struct PartitionDescData
@@ -93,28 +34,6 @@ typedef struct PartitionDescData
typedef struct PartitionDescData *PartitionDesc;
-/*
- * PartitionPruneContext
- *
- * Information about a partitioned table needed to perform partition pruning.
- */
-typedef struct PartitionPruneContext
-{
- /* Partition key information */
- char strategy;
- int partnatts;
- Oid *partopfamily;
- Oid *partopcintype;
- Oid *partcollation;
- FmgrInfo *partsupfunc;
-
- /* Number of partitions */
- int nparts;
-
- /* Partition boundary info */
- PartitionBoundInfo boundinfo;
-} PartitionPruneContext;
-
extern void RelationBuildPartitionDesc(Relation relation);
extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
bool *parttypbyval, PartitionBoundInfo b1,
diff --git a/src/include/catalog/partition_internal.h b/src/include/catalog/partition_internal.h
index b21d422e75..e3b9123a2e 100644
--- a/src/include/catalog/partition_internal.h
+++ b/src/include/catalog/partition_internal.h
@@ -11,6 +11,9 @@
#ifndef PARTITION_INTERNAL_H
#define PARTITION_INTERNAL_H
+#include "fmgr.h"
+#include "nodes/parsenodes.h"
+
/*
* When qsort'ing partition bounds after reading from the catalog, each bound
* is represented with one of the following structs.
@@ -40,6 +43,63 @@ typedef struct PartitionRangeBound
bool lower; /* this is the lower (vs upper) bound */
} PartitionRangeBound;
+/*
+ * PartitionBoundInfoData encapsulates a set of partition bounds. It is
+ * usually associated with partitioned tables as part of its partition
+ * descriptor, but may also be used to represent a virtual partitioned
+ * table such as a partitioned joinrel within the planner.
+ *
+ * A list partition datum that is known to be NULL is never put into the
+ * datums array. Instead, it is tracked using the null_index field.
+ *
+ * In the case of range partitioning, ndatums will typically be far less than
+ * 2 * nparts, because a partition's upper bound and the next partition's lower
+ * bound are the same in most common cases, and we only store one of them (the
+ * upper bound). In case of hash partitioning, ndatums will be same as the
+ * number of partitions.
+ *
+ * For range and list partitioned tables, datums is an array of datum-tuples
+ * with key->partnatts datums each. For hash partitioned tables, it is an array
+ * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
+ * given partition.
+ *
+ * The datums in datums array are arranged in increasing order as defined by
+ * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
+ * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
+ * respectively. For range and list partitions this simply means that the
+ * datums in the datums array are arranged in increasing order as defined by
+ * the partition key's operator classes and collations.
+ *
+ * In the case of list partitioning, the indexes array stores one entry for
+ * every datum, which is the index of the partition that accepts a given datum.
+ * In case of range partitioning, it stores one entry per distinct range
+ * datum, which is the index of the partition for which a given datum
+ * is an upper bound. In the case of hash partitioning, the number of the
+ * entries in the indexes array is same as the greatest modulus amongst all
+ * partitions. For a given partition key datum-tuple, the index of the
+ * partition which would accept that datum-tuple would be given by the entry
+ * pointed by remainder produced when hash value of the datum-tuple is divided
+ * by the greatest modulus.
+ */
+typedef struct PartitionBoundInfoData
+{
+ char strategy; /* hash, list or range? */
+ int ndatums; /* Length of the datums following array */
+ Datum **datums;
+ PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
+ * NULL for hash and list partitioned
+ * tables */
+ int *indexes; /* Partition indexes */
+ int null_index; /* Index of the null-accepting partition; -1
+ * if there isn't one */
+ int default_index; /* Index of the default partition; -1 if there
+ * isn't one */
+} PartitionBoundInfoData;
+
+typedef struct PartitionBoundInfoData *PartitionBoundInfo;
+
+#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
+#define partition_bound_has_default(bi) ((bi)->default_index != -1)
extern int get_hash_partition_greatest_modulus(PartitionBoundInfo b);
extern int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
diff --git a/src/include/optimizer/partprune.h b/src/include/optimizer/partprune.h
index 027016b32c..093ade24c3 100644
--- a/src/include/optimizer/partprune.h
+++ b/src/include/optimizer/partprune.h
@@ -14,8 +14,31 @@
#ifndef PARTPRUNE_H
#define PARTPRUNE_H
+#include "catalog/partition_internal.h"
#include "nodes/relation.h"
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
extern Relids prune_append_rel_partitions(RelOptInfo *rel);
#endif /* PARTPRUNE_H */
On 2018/04/06 8:33, Alvaro Herrera wrote:
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, * parentrte already has the root partrel's updatedCols translated to match * the attribute ordering of parentrel. */ - if (!*part_cols_updated) - *part_cols_updated = + if (!root->partColsUpdated) + root->partColsUpdated = has_partition_attrs(parentrel, parentrte->updatedCols, NULL);Hmm, surely this should be |= to avoid resetting a value set in a
previous call to this function?
It won't be, no? We set it only if it hasn't been already. Note that
there is one PlannerInfo per sub-query, so we determine this information
independently for each sub-query.
In the previous coding it wasn't
necessary because it was a local variable ... (though, isn't it a bit
odd to have this in PlannerInfo? seems like it should be in
resultRelInfo, but then you already have it there so I suppose this one
does *more*)
Hmm, you'd think that we can figure this out in the executor itself, and
hence don't to have this in PlannerInfo or in ModifyTable. But IIRC,
during the discussion of the update tuple routing patch, it became clear
that it's best do that here, given the way things are now wrt the timing
of partition/inheritance tree expansion. An update query may modify the
partition key of a table at any arbitrary level and we have to look at all
the tables in the partition tree in this planning phase anyway, so it's
also the best time to see it if the query's modifiedCols overlaps with the
partition key of some table in the tree. Once we've found that it does
for some table (most likely the root), we're done, that is, we know we got
some "partColsUpdated".
I realized that I had gotten rid of has_default_part from RelOptInfo but
hadn't deleted a comment about it; attached patch to fix that.
Thanks,
Amit
Attachments:
fastprune-delta-remove-obsolete-comment.patchtext/plain; charset=UTF-8; name=fastprune-delta-remove-obsolete-comment.patchDownload
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 4dc4cc4547..170d22122a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -536,7 +536,6 @@ typedef struct PartitionSchemeData *PartitionScheme;
* part_scheme - Partitioning scheme of the relation
* boundinfo - Partition bounds
* nparts - Number of partitions
- * has_default_part - Whether the table has a default partition
* partition_qual - Partition constraint if not the root
* part_rels - RelOptInfos for each partition
* partexprs, nullable_partexprs - Partition key expressions
Amit Langote wrote:
Hi.
On 2018/04/06 7:35, Alvaro Herrera wrote:
I seems pretty clear that putting get_matching_partitions() in
catalog/partition.c is totally the wrong thing; it belongs wholly in
partprune. I think the reason you put it there is that it requires
access to a lot of internals that are static in partition.c. In the
attached not yet cleaned version of the patch, I have moved a whole lot
of what you added to partition.c to partprune.c; and for the functions
and struct declarations that were required to make it work, I created
catalog/partition_internal.h.Yes, I really wanted for most of the new code that this patch adds to land
in the planner, especially after Robert's comments here:/messages/by-id/CA+Tgmoabi-29Vs8H0xkjtYB=cU+GVCrNwPz7okpa3KsoLmdEUQ@mail.gmail.com
It would've been nice if we'd gotten the "reorganizing partitioning code"
thread resolved sooner.
Grumble.
I don't actually like very much the idea of putting all this code in
optimizer/util. This morning it occurred to me that we should create a new
src/backend/partitioning/ (and a src/include/partitioning/ to go with
it) and drop a bunch of files there. Even your proposed new partcache.c
will seem misplaced *anywhere*, since it contains support code to be
used by both planner and executor; in src/{backend,include}/partitioning
it will be able to serve both without it being a modularity wart.
BTW including partition_internal.h in partition.h would defeat the point
of having partition_internal.h in the first place -- at that point you'd
be better off just putting it all in partition.h and save the hassle of
a separate file. But given the liberty with which catalog/partition.h
has been included everywhere else, IMO that would be pretty disastrous.
I propose to work on reorganizing this code after the commitfest is
over, as part of release stabilization. I'd rather not have us
supporting a messy system for only five years, if we restructure during
pg12 (which would mean a lot of backpatching pain and pg11-specific
bugs); or worse, forever, if we keep the current proposed layout.
One thing I don't want to do is create a new file that we'll later have
to rename or move, so choosing the best locations is a necessity.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
David Rowley wrote:
2. I guess this will be removed before commit.
+#if 0 <large section of comments> +#endif
Yeah, there is one sentence there I didn't quite understand and would
like to add it to the rewritten version of the comment before I remove
the whole ifdeffed-out comment.
* PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
* step(s)" generated for the clause due to it being a BoolExpr or a
* ScalarArrayOpExpr that's turned into one
Exactly what does "ScalarArrayOpExpr that's turned into one" means?
Does it mean we turn SAOP into BoolExpr?
(Yes, I know "#if 0" inside a comment doesn't do anything. It's only
documentation for myself.)
If you look at the rest of the rewritten comment, you'll notice some
things probably need more explaining. Wording suggestions welcome.
3. This comment seems like a strange thing to write just before
testing if the clause matches the partition key.+ /* Clause does not match this partition key. */ + if (equal(leftop, partkey)) + *rightop = not_clause((Node *) clause) + ? (Expr *) makeBoolConst(false, false) + : (Expr *) makeBoolConst(true, false);
Yeah. Looking at this function, I noticed it tests for BooleanTest, and
falls back to checking "not_clause" and a few equals. Does it make
sense if the clause is a SAOP? I added this assert:
Assert(IsA(clause, BooleanTest) ||
IsA(clause, BoolExpr) ||
IsA(clause, RelabelType));
and it failed:
#3 0x0000556cf04505db in match_boolean_partition_clause (partopfamily=424,
clause=0x556cf1041670, partkey=0x556cf1042218, rightop=0x7ffe520ec068)
at /pgsql/source/master/src/backend/optimizer/util/partprune.c:2159
2159 Assert(IsA(clause, BooleanTest) ||
(gdb) print *clause
$1 = {type = T_ScalarArrayOpExpr}
I'm not sure whether or not this function can trust that what's incoming
must absolutely be only those node types.
4. Comment needs removed.
+ * has_default_part - Whether the table has a default partition
Done.
The only other thing I noted on this pass is that we could get rid of:
+ /* go check the next clause. */ + if (unsupported_clause) + break;and just "continue" instead of "break" in all cases apart from case
PARTCLAUSE_UNSUPPORTED:it would save a few lines and a single condition. What's there works,
but thought this might be better...
Makes sense -- looking.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
BTW, having both key_is_null and key_is_not_null output args to convey a
single bit of info is a bit lame. I'm removing it. We could do the
same with a single boolean, since the return value already indicates
it's a matching IS [NOT] NULL clause; we only need to indicate whether
the NOT is present.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote:
Yeah. Looking at this function, I noticed it tests for BooleanTest, and
falls back to checking "not_clause" and a few equals. Does it make
sense if the clause is a SAOP? I added this assert:
Assert(IsA(clause, BooleanTest) ||
IsA(clause, BoolExpr) ||
IsA(clause, RelabelType));and it failed:
#3 0x0000556cf04505db in match_boolean_partition_clause (partopfamily=424,
clause=0x556cf1041670, partkey=0x556cf1042218, rightop=0x7ffe520ec068)
at /pgsql/source/master/src/backend/optimizer/util/partprune.c:2159
2159 Assert(IsA(clause, BooleanTest) ||
(gdb) print *clause
$1 = {type = T_ScalarArrayOpExpr}I'm not sure whether or not this function can trust that what's incoming
must absolutely be only those node types.
So this is what I need for current regression tests not to crash
anymore:
Assert(IsA(clause, BooleanTest) ||
IsA(clause, BoolExpr) ||
IsA(clause, RelabelType) ||
IsA(clause, ScalarArrayOpExpr) ||
IsA(clause, OpExpr) ||
IsA(clause, Var));
I'm not confident in my ability to write code to handle all possible
cases right now (obviously there must be more cases that are not covered
by current regression tests), so I'll leave it without the assert since
it handles a couple of the useful cases, but I suspect it could stand
some more improvement.
I guess the question is, how interesting is boolean partitioning? I bet
it has its uses.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Apr 6, 2018 at 11:54 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Alvaro Herrera wrote:
Yeah. Looking at this function, I noticed it tests for BooleanTest, and
falls back to checking "not_clause" and a few equals. Does it make
sense if the clause is a SAOP? I added this assert:
Assert(IsA(clause, BooleanTest) ||
IsA(clause, BoolExpr) ||
IsA(clause, RelabelType));and it failed:
#3 0x0000556cf04505db in match_boolean_partition_clause (partopfamily=424,
clause=0x556cf1041670, partkey=0x556cf1042218, rightop=0x7ffe520ec068)
at /pgsql/source/master/src/backend/optimizer/util/partprune.c:2159
2159 Assert(IsA(clause, BooleanTest) ||
(gdb) print *clause
$1 = {type = T_ScalarArrayOpExpr}I'm not sure whether or not this function can trust that what's incoming
must absolutely be only those node types.So this is what I need for current regression tests not to crash
anymore:Assert(IsA(clause, BooleanTest) ||
IsA(clause, BoolExpr) ||
IsA(clause, RelabelType) ||
IsA(clause, ScalarArrayOpExpr) ||
IsA(clause, OpExpr) ||
IsA(clause, Var));I'm not confident in my ability to write code to handle all possible
cases right now (obviously there must be more cases that are not covered
by current regression tests), so I'll leave it without the assert since
it handles a couple of the useful cases, but I suspect it could stand
some more improvement.I guess the question is, how interesting is boolean partitioning? I bet
it has its uses.
match_boolean_partition_clauses() exists to capture some cases where
an OpExpr (any expression that returns a Boolean for that matter)
itself is the partition key:
create table boolpart (a int) partition by list ((a = 1));
create table boolpart_t partition of boolpart for values in ('true');
create table boolpart_f partition of boolpart for values in ('false');
explain select * from boolpart where a = 1;
QUERY PLAN
------------------------------------------------------------------
Append (cost=0.00..41.88 rows=13 width=4)
-> Seq Scan on boolpart_t (cost=0.00..41.88 rows=13 width=4)
Filter: (a = 1)
(3 rows)
explain select * from boolpart where a = 2;
QUERY PLAN
------------------------------------------------------------------
Append (cost=0.00..41.88 rows=13 width=4)
-> Seq Scan on boolpart_f (cost=0.00..41.88 rows=13 width=4)
Filter: (a = 2)
(3 rows)
So, it's not that we're only in position to accept certain node types
in match_boolean_partition_clauses(). Before it existed, the pruning
didn't work because it wasn't matched to the partition key in the
special way that match_boolean_partition_clauses() does and end up in
the block in match_clause_to_partition_key() where the OpExpr's are
analyzed for normal (non-Boolean) situations, where we extract either
the leftop or rightop and try to match it with the partition key.
It might as well be:
create table boolpart (a int) partition by list ((a in (1, 2)));
Requiring us to be position to match an ScalarArrayOpExpr with the
partition key.
This resembles match_boolean_index_clause(), by the way.
Thanks,
Amit
On Fri, Apr 6, 2018 at 11:38 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Yeah, there is one sentence there I didn't quite understand and would
like to add it to the rewritten version of the comment before I remove
the whole ifdeffed-out comment.* PARTCLAUSE_MATCH_STEPS: *clause_steps set to list of "partition pruning
* step(s)" generated for the clause due to it being a BoolExpr or a
* ScalarArrayOpExpr that's turned into oneExactly what does "ScalarArrayOpExpr that's turned into one" means?
Does it mean we turn SAOP into BoolExpr?
Yes, we turn a ScalarArrayOpExpr into a BoolExpr and generate prune
step for the latter. Maybe we'll have a base pruning step that can
process a ScalarArrayOpExpr directly someday. We create base steps
only for OpExpr's for now.
If you look at the rest of the rewritten comment, you'll notice some
things probably need more explaining. Wording suggestions welcome.
When I looked at it earlier today, I thought your rewrite looked much better.
Thanks,
Amit
Amit Langote wrote:
Some comments on the code reorganizing part of the patch:
* Did you intentionally not put PartitionBoundInfoData and its accessor
macros in partition_internal.h. partprune.c would not need to include
partition.h if we do that.
Not really.
After pondering this some more, I decided to call the new file
src/include/partition/partbounds.h; and the other new file will become
src/include/partition/partprune.h. This leads naturally to the idea
that PartitionBoundInfoData will be in partbounds.h. However, the
typedef struct PartitionBoundInfoData *PartitionBoundInfo will have to
remain in catalog/partition.h, at least for the time being.
* Also, I wonder why you left PartitionPruneContext in partition.h. Isn't
it better taken out to partprune.h?
Yes.
* Why isn't gen_partprune_steps() in partprune.h? I see only
prune_append_rel_partitions() exported out of partprune.c, but the runtime
patch needs gen_partprune_steps() to be called from createplan.c.
* I don't see get_matching_partitions() exported either. Runtime pruning
patch needs that too.
True -- both exported.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Here's my proposed patch.
Idle thought: how about renaming the "constfalse" argument and variables
to "contradictory" or maybe just "contradict"?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
partprune-51.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/Makefile b/src/backend/Makefile
index a4b6d1658c..42a0748ade 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -18,7 +18,8 @@ top_builddir = ../..
include $(top_builddir)/src/Makefile.global
SUBDIRS = access bootstrap catalog parser commands executor foreign lib libpq \
- main nodes optimizer port postmaster regex replication rewrite \
+ main nodes optimizer partitioning port postmaster \
+ regex replication rewrite \
statistics storage tcop tsearch utils $(top_builddir)/src/timezone \
jit
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 39ee773d93..a60e5c20d6 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -41,6 +41,7 @@
#include "optimizer/prep.h"
#include "optimizer/var.h"
#include "parser/parse_coerce.h"
+#include "partitioning/partbounds.h"
#include "rewrite/rewriteManip.h"
#include "storage/lmgr.h"
#include "utils/array.h"
@@ -55,89 +56,6 @@
#include "utils/ruleutils.h"
#include "utils/syscache.h"
-/*
- * Information about bounds of a partitioned relation
- *
- * A list partition datum that is known to be NULL is never put into the
- * datums array. Instead, it is tracked using the null_index field.
- *
- * In the case of range partitioning, ndatums will typically be far less than
- * 2 * nparts, because a partition's upper bound and the next partition's lower
- * bound are the same in most common cases, and we only store one of them (the
- * upper bound). In case of hash partitioning, ndatums will be same as the
- * number of partitions.
- *
- * For range and list partitioned tables, datums is an array of datum-tuples
- * with key->partnatts datums each. For hash partitioned tables, it is an array
- * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
- * given partition.
- *
- * The datums in datums array are arranged in increasing order as defined by
- * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
- * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
- * respectively. For range and list partitions this simply means that the
- * datums in the datums array are arranged in increasing order as defined by
- * the partition key's operator classes and collations.
- *
- * In the case of list partitioning, the indexes array stores one entry for
- * every datum, which is the index of the partition that accepts a given datum.
- * In case of range partitioning, it stores one entry per distinct range
- * datum, which is the index of the partition for which a given datum
- * is an upper bound. In the case of hash partitioning, the number of the
- * entries in the indexes array is same as the greatest modulus amongst all
- * partitions. For a given partition key datum-tuple, the index of the
- * partition which would accept that datum-tuple would be given by the entry
- * pointed by remainder produced when hash value of the datum-tuple is divided
- * by the greatest modulus.
- */
-
-typedef struct PartitionBoundInfoData
-{
- char strategy; /* hash, list or range? */
- int ndatums; /* Length of the datums following array */
- Datum **datums;
- PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
- * NULL for hash and list partitioned
- * tables */
- int *indexes; /* Partition indexes */
- int null_index; /* Index of the null-accepting partition; -1
- * if there isn't one */
- int default_index; /* Index of the default partition; -1 if there
- * isn't one */
-} PartitionBoundInfoData;
-
-#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
-#define partition_bound_has_default(bi) ((bi)->default_index != -1)
-
-/*
- * When qsort'ing partition bounds after reading from the catalog, each bound
- * is represented with one of the following structs.
- */
-
-/* One bound of a hash partition */
-typedef struct PartitionHashBound
-{
- int modulus;
- int remainder;
- int index;
-} PartitionHashBound;
-
-/* One value coming from some (index'th) list partition */
-typedef struct PartitionListValue
-{
- int index;
- Datum value;
-} PartitionListValue;
-
-/* One bound of a range partition */
-typedef struct PartitionRangeBound
-{
- int index;
- Datum *datums; /* range bound datums */
- PartitionRangeDatumKind *kind; /* the kind of each datum */
- bool lower; /* this is the lower (vs upper) bound */
-} PartitionRangeBound;
-
static Oid get_partition_parent_worker(Relation inhRel, Oid relid);
static void get_partition_ancestors_worker(Relation inhRel, Oid relid,
@@ -173,29 +91,9 @@ static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
Oid *partcollation, Datum *datums1,
PartitionRangeDatumKind *kind1, bool lower1,
PartitionRangeBound *b2);
-static int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
- Oid *partcollation,
- Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
- Datum *tuple_datums, int n_tuple_datums);
-
-static int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
- PartitionBoundInfo boundinfo,
- Datum value, bool *is_equal);
-static int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
- Oid *partcollation,
- PartitionBoundInfo boundinfo,
- PartitionRangeBound *probe, bool *is_equal);
-static int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
- Oid *partcollation,
- PartitionBoundInfo boundinfo,
- int nvalues, Datum *values, bool *is_equal);
-static int partition_hash_bsearch(PartitionBoundInfo boundinfo,
- int modulus, int remainder);
static int get_partition_bound_num_indexes(PartitionBoundInfo b);
-static int get_greatest_modulus(PartitionBoundInfo b);
-static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
- Datum *values, bool *isnull);
+
/*
* RelationBuildPartitionDesc
@@ -765,13 +663,13 @@ partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
if (b1->strategy == PARTITION_STRATEGY_HASH)
{
- int greatest_modulus = get_greatest_modulus(b1);
+ int greatest_modulus = get_hash_partition_greatest_modulus(b1);
/*
* If two hash partitioned tables have different greatest moduli,
* their partition schemes don't match.
*/
- if (greatest_modulus != get_greatest_modulus(b2))
+ if (greatest_modulus != get_hash_partition_greatest_modulus(b2))
return false;
/*
@@ -1029,7 +927,7 @@ check_new_partition_bound(char *relname, Relation parent,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("every hash partition modulus must be a factor of the next larger modulus")));
- greatest_modulus = get_greatest_modulus(boundinfo);
+ greatest_modulus = get_hash_partition_greatest_modulus(boundinfo);
remainder = spec->remainder;
/*
@@ -1620,7 +1518,6 @@ get_partition_qual_relid(Oid relid)
return result;
}
-/* Module-local functions */
/*
* get_partition_operator
@@ -2637,7 +2534,7 @@ get_partition_for_tuple(Relation relation, Datum *values, bool *isnull)
case PARTITION_STRATEGY_HASH:
{
PartitionBoundInfo boundinfo = partdesc->boundinfo;
- int greatest_modulus = get_greatest_modulus(boundinfo);
+ int greatest_modulus = get_hash_partition_greatest_modulus(boundinfo);
uint64 rowHash = compute_hash_value(key->partnatts,
key->partsupfunc,
values, isnull);
@@ -2971,7 +2868,7 @@ partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
* of attributes resp.
*
*/
-static int32
+int32
partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
Datum *tuple_datums, int n_tuple_datums)
@@ -3005,7 +2902,7 @@ partition_rbound_datum_cmp(FmgrInfo *partsupfunc, Oid *partcollation,
* *is_equal is set to true if the bound datum at the returned index is equal
* to the input value.
*/
-static int
+int
partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
Datum value, bool *is_equal)
@@ -3048,7 +2945,7 @@ partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
* *is_equal is set to true if the range bound at the returned index is equal
* to the input range bound
*/
-static int
+int
partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
Oid *partcollation,
PartitionBoundInfo boundinfo,
@@ -3093,7 +2990,7 @@ partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
* *is_equal is set to true if the range bound at the returned index is equal
* to the input tuple.
*/
-static int
+int
partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
PartitionBoundInfo boundinfo,
int nvalues, Datum *values, bool *is_equal)
@@ -3136,7 +3033,7 @@ partition_range_datum_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
* less than or equal to the given (modulus, remainder) pair or -1 if
* all of them are greater
*/
-static int
+int
partition_hash_bsearch(PartitionBoundInfo boundinfo,
int modulus, int remainder)
{
@@ -3294,7 +3191,7 @@ get_partition_bound_num_indexes(PartitionBoundInfo bound)
* The number of the entries in the indexes array is same as the
* greatest modulus.
*/
- num_indexes = get_greatest_modulus(bound);
+ num_indexes = get_hash_partition_greatest_modulus(bound);
break;
case PARTITION_STRATEGY_LIST:
@@ -3315,14 +3212,14 @@ get_partition_bound_num_indexes(PartitionBoundInfo bound)
}
/*
- * get_greatest_modulus
+ * get_hash_partition_greatest_modulus
*
* Returns the greatest modulus of the hash partition bound. The greatest
* modulus will be at the end of the datums array because hash partitions are
* arranged in the ascending order of their modulus and remainders.
*/
-static int
-get_greatest_modulus(PartitionBoundInfo bound)
+int
+get_hash_partition_greatest_modulus(PartitionBoundInfo bound)
{
Assert(bound && bound->strategy == PARTITION_STRATEGY_HASH);
Assert(bound->datums && bound->ndatums > 0);
@@ -3336,7 +3233,7 @@ get_greatest_modulus(PartitionBoundInfo bound)
*
* Compute the hash value for given not null partition key values.
*/
-static uint64
+uint64
compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
Datum *values, bool *isnull)
{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d2e4aa3c2f..9287baaedc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2150,6 +2150,38 @@ _copyMergeAction(const MergeAction *from)
return newnode;
}
+/*
+ * _copyPartitionPruneStepOp
+ */
+static PartitionPruneStepOp *
+_copyPartitionPruneStepOp(const PartitionPruneStepOp *from)
+{
+ PartitionPruneStepOp *newnode = makeNode(PartitionPruneStepOp);
+
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(opstrategy);
+ COPY_NODE_FIELD(exprs);
+ COPY_NODE_FIELD(cmpfns);
+ COPY_BITMAPSET_FIELD(nullkeys);
+
+ return newnode;
+}
+
+/*
+ * _copyPartitionPruneStepCombine
+ */
+static PartitionPruneStepCombine *
+_copyPartitionPruneStepCombine(const PartitionPruneStepCombine *from)
+{
+ PartitionPruneStepCombine *newnode = makeNode(PartitionPruneStepCombine);
+
+ COPY_SCALAR_FIELD(step.step_id);
+ COPY_SCALAR_FIELD(combineOp);
+ COPY_NODE_FIELD(source_stepids);
+
+ return newnode;
+}
+
/* ****************************************************************
* relation.h copy functions
*
@@ -2278,21 +2310,6 @@ _copyAppendRelInfo(const AppendRelInfo *from)
}
/*
- * _copyPartitionedChildRelInfo
- */
-static PartitionedChildRelInfo *
-_copyPartitionedChildRelInfo(const PartitionedChildRelInfo *from)
-{
- PartitionedChildRelInfo *newnode = makeNode(PartitionedChildRelInfo);
-
- COPY_SCALAR_FIELD(parent_relid);
- COPY_NODE_FIELD(child_rels);
- COPY_SCALAR_FIELD(part_cols_updated);
-
- return newnode;
-}
-
-/*
* _copyPlaceHolderInfo
*/
static PlaceHolderInfo *
@@ -5076,6 +5093,12 @@ copyObjectImpl(const void *from)
case T_MergeAction:
retval = _copyMergeAction(from);
break;
+ case T_PartitionPruneStepOp:
+ retval = _copyPartitionPruneStepOp(from);
+ break;
+ case T_PartitionPruneStepCombine:
+ retval = _copyPartitionPruneStepCombine(from);
+ break;
/*
* RELATION NODES
@@ -5095,9 +5118,6 @@ copyObjectImpl(const void *from)
case T_AppendRelInfo:
retval = _copyAppendRelInfo(from);
break;
- case T_PartitionedChildRelInfo:
- retval = _copyPartitionedChildRelInfo(from);
- break;
case T_PlaceHolderInfo:
retval = _copyPlaceHolderInfo(from);
break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index f2dd9035df..d758515cfd 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -916,16 +916,6 @@ _equalAppendRelInfo(const AppendRelInfo *a, const AppendRelInfo *b)
}
static bool
-_equalPartitionedChildRelInfo(const PartitionedChildRelInfo *a, const PartitionedChildRelInfo *b)
-{
- COMPARE_SCALAR_FIELD(parent_relid);
- COMPARE_NODE_FIELD(child_rels);
- COMPARE_SCALAR_FIELD(part_cols_updated);
-
- return true;
-}
-
-static bool
_equalPlaceHolderInfo(const PlaceHolderInfo *a, const PlaceHolderInfo *b)
{
COMPARE_SCALAR_FIELD(phid);
@@ -3230,9 +3220,6 @@ equal(const void *a, const void *b)
case T_AppendRelInfo:
retval = _equalAppendRelInfo(a, b);
break;
- case T_PartitionedChildRelInfo:
- retval = _equalPartitionedChildRelInfo(a, b);
- break;
case T_PlaceHolderInfo:
retval = _equalPlaceHolderInfo(a, b);
break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index f2f8227eb2..51c418778a 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2156,6 +2156,17 @@ expression_tree_walker(Node *node,
return true;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+
+ if (walker((Node *) opstep->exprs, context))
+ return true;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression subnodes */
+ break;
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
@@ -2958,6 +2969,20 @@ expression_tree_mutator(Node *node,
return (Node *) newnode;
}
break;
+ case T_PartitionPruneStepOp:
+ {
+ PartitionPruneStepOp *opstep = (PartitionPruneStepOp *) node;
+ PartitionPruneStepOp *newnode;
+
+ FLATCOPY(newnode, opstep, PartitionPruneStepOp);
+ MUTATE(newnode->exprs, opstep->exprs, List *);
+
+ return (Node *) newnode;
+ }
+ break;
+ case T_PartitionPruneStepCombine:
+ /* no expression sub-nodes */
+ return (Node *) copyObject(node);
case T_JoinExpr:
{
JoinExpr *join = (JoinExpr *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a6a1c16164..03a91c3352 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1711,6 +1711,28 @@ _outFromExpr(StringInfo str, const FromExpr *node)
}
static void
+_outPartitionPruneStepOp(StringInfo str, const PartitionPruneStepOp *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPOP");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_INT_FIELD(opstrategy);
+ WRITE_NODE_FIELD(exprs);
+ WRITE_NODE_FIELD(cmpfns);
+ WRITE_BITMAPSET_FIELD(nullkeys);
+}
+
+static void
+_outPartitionPruneStepCombine(StringInfo str, const PartitionPruneStepCombine *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNESTEPCOMBINE");
+
+ WRITE_INT_FIELD(step.step_id);
+ WRITE_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ WRITE_NODE_FIELD(source_stepids);
+}
+
+static void
_outOnConflictExpr(StringInfo str, const OnConflictExpr *node)
{
WRITE_NODE_TYPE("ONCONFLICTEXPR");
@@ -2261,7 +2283,6 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_NODE_FIELD(full_join_clauses);
WRITE_NODE_FIELD(join_info_list);
WRITE_NODE_FIELD(append_rel_list);
- WRITE_NODE_FIELD(pcinfo_list);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(placeholder_list);
WRITE_NODE_FIELD(fkey_list);
@@ -2286,6 +2307,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_INT_FIELD(wt_param_id);
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
+ WRITE_BOOL_FIELD(partColsUpdated);
}
static void
@@ -2335,6 +2357,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_NODE_FIELD(joininfo);
WRITE_BOOL_FIELD(has_eclass_joins);
WRITE_BITMAPSET_FIELD(top_parent_relids);
+ WRITE_NODE_FIELD(partitioned_child_rels);
}
static void
@@ -2560,16 +2583,6 @@ _outAppendRelInfo(StringInfo str, const AppendRelInfo *node)
}
static void
-_outPartitionedChildRelInfo(StringInfo str, const PartitionedChildRelInfo *node)
-{
- WRITE_NODE_TYPE("PARTITIONEDCHILDRELINFO");
-
- WRITE_UINT_FIELD(parent_relid);
- WRITE_NODE_FIELD(child_rels);
- WRITE_BOOL_FIELD(part_cols_updated);
-}
-
-static void
_outPlaceHolderInfo(StringInfo str, const PlaceHolderInfo *node)
{
WRITE_NODE_TYPE("PLACEHOLDERINFO");
@@ -3973,6 +3986,12 @@ outNode(StringInfo str, const void *obj)
case T_MergeAction:
_outMergeAction(str, obj);
break;
+ case T_PartitionPruneStepOp:
+ _outPartitionPruneStepOp(str, obj);
+ break;
+ case T_PartitionPruneStepCombine:
+ _outPartitionPruneStepCombine(str, obj);
+ break;
case T_Path:
_outPath(str, obj);
break;
@@ -4114,9 +4133,6 @@ outNode(StringInfo str, const void *obj)
case T_AppendRelInfo:
_outAppendRelInfo(str, obj);
break;
- case T_PartitionedChildRelInfo:
- _outPartitionedChildRelInfo(str, obj);
- break;
case T_PlaceHolderInfo:
_outPlaceHolderInfo(str, obj);
break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 37e3568595..2812dc9646 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1331,6 +1331,32 @@ _readOnConflictExpr(void)
READ_DONE();
}
+static PartitionPruneStepOp *
+_readPartitionPruneStepOp(void)
+{
+ READ_LOCALS(PartitionPruneStepOp);
+
+ READ_INT_FIELD(step.step_id);
+ READ_INT_FIELD(opstrategy);
+ READ_NODE_FIELD(exprs);
+ READ_NODE_FIELD(cmpfns);
+ READ_BITMAPSET_FIELD(nullkeys);
+
+ READ_DONE();
+}
+
+static PartitionPruneStepCombine *
+_readPartitionPruneStepCombine(void)
+{
+ READ_LOCALS(PartitionPruneStepCombine);
+
+ READ_INT_FIELD(step.step_id);
+ READ_ENUM_FIELD(combineOp, PartitionPruneCombineOp);
+ READ_NODE_FIELD(source_stepids);
+
+ READ_DONE();
+}
+
/*
* _readMergeAction
*/
@@ -2615,6 +2641,10 @@ parseNodeString(void)
return_value = _readOnConflictExpr();
else if (MATCH("MERGEACTION", 11))
return_value = _readMergeAction();
+ else if (MATCH("PARTITIONPRUNESTEPOP", 20))
+ return_value = _readPartitionPruneStepOp();
+ else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
+ return_value = _readPartitionPruneStepCombine();
else if (MATCH("RTE", 3))
return_value = _readRangeTblEntry();
else if (MATCH("RANGETBLFUNCTION", 16))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c4e4db15a6..65a34a255d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -43,6 +43,7 @@
#include "optimizer/var.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
+#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
@@ -874,6 +875,8 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
double *parent_attrsizes;
int nattrs;
ListCell *l;
+ Relids live_children = NULL;
+ bool did_pruning = false;
/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();
@@ -881,6 +884,31 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Assert(IS_SIMPLE_REL(rel));
/*
+ * Initialize partitioned_child_rels to contain this RT index.
+ *
+ * Note that during the set_append_rel_pathlist() phase, we will bubble up
+ * the indexes of partitioned relations that appear down in the tree, so
+ * that when we've created Paths for all the children, the root
+ * partitioned table's list will contain all such indexes.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+ rel->partitioned_child_rels = list_make1_int(rti);
+
+ /*
+ * If the partitioned relation has any baserestrictinfo quals then we
+ * attempt to use these quals to prune away partitions that cannot
+ * possibly contain any tuples matching these quals. In this case we'll
+ * store the relids of all partitions which could possibly contain a
+ * matching tuple, and skip anything else in the loop below.
+ */
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE &&
+ rel->baserestrictinfo != NIL)
+ {
+ live_children = prune_append_rel_partitions(rel);
+ did_pruning = true;
+ }
+
+ /*
* Initialize to compute size estimates for whole append relation.
*
* We handle width estimates by weighting the widths of different child
@@ -1128,6 +1156,13 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
continue;
}
+ if (did_pruning && !bms_is_member(appinfo->child_relid, live_children))
+ {
+ /* This partition was pruned; skip it. */
+ set_dummy_rel_pathlist(childrel);
+ continue;
+ }
+
if (relation_excluded_by_constraints(root, childrel, childRTE))
{
/*
@@ -1309,6 +1344,12 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+ /* Bubble up childrel's partitioned children. */
+ if (rel->part_scheme)
+ rel->partitioned_child_rels =
+ list_concat(rel->partitioned_child_rels,
+ list_copy(childrel->partitioned_child_rels));
+
/*
* Child is live, so add it to the live_childrels list for use below.
*/
@@ -1346,49 +1387,55 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
- RangeTblEntry *rte;
bool build_partitioned_rels = false;
double partial_rows = -1;
- if (IS_SIMPLE_REL(rel))
+ /*
+ * AppendPath generated for partitioned tables must record the RT indexes
+ * of partitioned tables that are direct or indirect children of this
+ * Append rel.
+ *
+ * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
+ * itself does not represent a partitioned relation, but the child sub-
+ * queries may contain references to partitioned relations. The loop
+ * below will look for such children and collect them in a list to be
+ * passed to the path creation function. (This assumes that we don't need
+ * to look through multiple levels of subquery RTEs; if we ever do, we
+ * could consider stuffing the list we generate here into sub-query RTE's
+ * RelOptInfo, just like we do for partitioned rels, which would be used
+ * when populating our parent rel with paths. For the present, that
+ * appears to be unnecessary.)
+ */
+ if (rel->part_scheme != NULL)
{
- /*
- * A root partition will already have a PartitionedChildRelInfo, and a
- * non-root partitioned table doesn't need one, because its Append
- * paths will get flattened into the parent anyway. For a subquery
- * RTE, no PartitionedChildRelInfo exists; we collect all
- * partitioned_rels associated with any child. (This assumes that we
- * don't need to look through multiple levels of subquery RTEs; if we
- * ever do, we could create a PartitionedChildRelInfo with the
- * accumulated list of partitioned_rels which would then be found when
- * populated our parent rel with paths. For the present, that appears
- * to be unnecessary.)
- */
- rte = planner_rt_fetch(rel->relid, root);
- switch (rte->rtekind)
+ if (IS_SIMPLE_REL(rel))
+ partitioned_rels = rel->partitioned_child_rels;
+ else if (IS_JOIN_REL(rel))
{
- case RTE_RELATION:
- if (rte->relkind == RELKIND_PARTITIONED_TABLE)
- partitioned_rels =
- get_partitioned_child_rels(root, rel->relid, NULL);
- break;
- case RTE_SUBQUERY:
- build_partitioned_rels = true;
- break;
- default:
- elog(ERROR, "unexpected rtekind: %d", (int) rte->rtekind);
+ int relid = -1;
+
+ /*
+ * For a partitioned joinrel, concatenate the component rels'
+ * partitioned_child_rels lists.
+ */
+ while ((relid = bms_next_member(rel->relids, relid)) >= 0)
+ {
+ RelOptInfo *component;
+
+ Assert(relid >= 1 && relid < root->simple_rel_array_size);
+ component = root->simple_rel_array[relid];
+ Assert(component->part_scheme != NULL);
+ Assert(list_length(component->partitioned_child_rels) >= 1);
+ partitioned_rels =
+ list_concat(partitioned_rels,
+ list_copy(component->partitioned_child_rels));
+ }
}
+
+ Assert(list_length(partitioned_rels) >= 1);
}
- else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
- {
- /*
- * Associate PartitionedChildRelInfo of the root partitioned tables
- * being joined with the root partitioned join (indicated by
- * RELOPT_JOINREL).
- */
- partitioned_rels = get_partitioned_child_rels_for_join(root,
- rel->relids);
- }
+ else if (rel->rtekind == RTE_SUBQUERY)
+ build_partitioned_rels = true;
/*
* For every non-dummy child, remember the cheapest path. Also, identify
@@ -1407,9 +1454,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (build_partitioned_rels)
{
- List *cprels;
+ List *cprels = childrel->partitioned_child_rels;
- cprels = get_partitioned_child_rels(root, childrel->relid, NULL);
partitioned_rels = list_concat(partitioned_rels,
list_copy(cprels));
}
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 594ac8eacb..ec3f60d311 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -40,9 +40,7 @@
#include "utils/selfuncs.h"
-#define IsBooleanOpfamily(opfamily) \
- ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
-
+/* XXX see PartCollMatchesExprColl */
#define IndexCollMatchesExprColl(idxcollation, exprcollation) \
((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 15c8d34c70..008492bad5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -616,7 +616,6 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
- root->pcinfo_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
@@ -631,6 +630,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
+ root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
@@ -1191,12 +1191,12 @@ inheritance_planner(PlannerInfo *root)
ListCell *lc;
Index rti;
RangeTblEntry *parent_rte;
+ Relids partitioned_relids = NULL;
List *partitioned_rels = NIL;
PlannerInfo *parent_root;
Query *parent_parse;
Bitmapset *parent_relids = bms_make_singleton(top_parentRTindex);
PlannerInfo **parent_roots = NULL;
- bool partColsUpdated = false;
Assert(parse->commandType != CMD_INSERT);
@@ -1268,10 +1268,12 @@ inheritance_planner(PlannerInfo *root)
if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
{
nominalRelation = top_parentRTindex;
- partitioned_rels = get_partitioned_child_rels(root, top_parentRTindex,
- &partColsUpdated);
- /* The root partitioned table is included as a child rel */
- Assert(list_length(partitioned_rels) >= 1);
+
+ /*
+ * Root parent's RT index is always present in the partitioned_rels of
+ * the ModifyTable node, if one is needed at all.
+ */
+ partitioned_relids = bms_make_singleton(top_parentRTindex);
}
/*
@@ -1503,6 +1505,15 @@ inheritance_planner(PlannerInfo *root)
continue;
/*
+ * Add the current parent's RT index to the partitione_rels set if
+ * we're going to create the ModifyTable path for a partitioned root
+ * table.
+ */
+ if (partitioned_relids)
+ partitioned_relids = bms_add_member(partitioned_relids,
+ appinfo->parent_relid);
+
+ /*
* If this is the first non-excluded child, its post-planning rtable
* becomes the initial contents of final_rtable; otherwise, append
* just its modified subquery RTEs to final_rtable.
@@ -1603,6 +1614,21 @@ inheritance_planner(PlannerInfo *root)
else
rowMarks = root->rowMarks;
+ if (partitioned_relids)
+ {
+ int i;
+
+ i = -1;
+ while ((i = bms_next_member(partitioned_relids, i)) >= 0)
+ partitioned_rels = lappend_int(partitioned_rels, i);
+
+ /*
+ * If we're going to create ModifyTable at all, the list should
+ * contain at least one member, that is, the root parent's index.
+ */
+ Assert(list_length(partitioned_rels) >= 1);
+ }
+
/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
add_path(final_rel, (Path *)
create_modifytable_path(root, final_rel,
@@ -1610,7 +1636,7 @@ inheritance_planner(PlannerInfo *root)
parse->canSetTag,
nominalRelation,
partitioned_rels,
- partColsUpdated,
+ root->partColsUpdated,
resultRelations,
0,
subpaths,
@@ -6145,65 +6171,6 @@ done:
}
/*
- * get_partitioned_child_rels
- * Returns a list of the RT indexes of the partitioned child relations
- * with rti as the root parent RT index. Also sets
- * *part_cols_updated to true if any of the root rte's updated
- * columns is used in the partition key either of the relation whose RTI
- * is specified or of any child relation.
- *
- * Note: This function might get called even for range table entries that
- * are not partitioned tables; in such a case, it will simply return NIL.
- */
-List *
-get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated)
-{
- List *result = NIL;
- ListCell *l;
-
- if (part_cols_updated)
- *part_cols_updated = false;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst_node(PartitionedChildRelInfo, l);
-
- if (pc->parent_relid == rti)
- {
- result = pc->child_rels;
- if (part_cols_updated)
- *part_cols_updated = pc->part_cols_updated;
- break;
- }
- }
-
- return result;
-}
-
-/*
- * get_partitioned_child_rels_for_join
- * Build and return a list containing the RTI of every partitioned
- * relation which is a child of some rel included in the join.
- */
-List *
-get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, root->pcinfo_list)
- {
- PartitionedChildRelInfo *pc = lfirst(l);
-
- if (bms_is_member(pc->parent_relid, join_relids))
- result = list_concat(result, list_copy(pc->child_rels));
- }
-
- return result;
-}
-
-/*
* add_paths_to_grouping_rel
*
* Add non-partial paths to grouping relation.
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 5236ab378e..67e47887fc 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -104,8 +104,7 @@ static void expand_partitioned_rtentry(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated);
+ List **appinfos);
static void expand_single_inheritance_child(PlannerInfo *root,
RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
@@ -1587,9 +1586,6 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
/* Scan the inheritance set and expand it */
if (RelationGetPartitionDesc(oldrelation) != NULL)
{
- List *partitioned_child_rels = NIL;
- bool part_cols_updated = false;
-
Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
/*
@@ -1598,28 +1594,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
* extract the partition key columns of all the partitioned tables.
*/
expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
- lockmode, &root->append_rel_list,
- &partitioned_child_rels,
- &part_cols_updated);
-
- /*
- * We keep a list of objects in root, each of which maps a root
- * partitioned parent RT index to the list of RT indexes of descendant
- * partitioned child tables. When creating an Append or a ModifyTable
- * path for the parent, we copy the child RT index list verbatim to
- * the path so that it could be carried over to the executor so that
- * the latter could identify the partitioned child tables.
- */
- if (rte->inh && partitioned_child_rels != NIL)
- {
- PartitionedChildRelInfo *pcinfo;
-
- pcinfo = makeNode(PartitionedChildRelInfo);
- pcinfo->parent_relid = rti;
- pcinfo->child_rels = partitioned_child_rels;
- pcinfo->part_cols_updated = part_cols_updated;
- root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
- }
+ lockmode, &root->append_rel_list);
}
else
{
@@ -1694,8 +1669,7 @@ static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
Index parentRTindex, Relation parentrel,
PlanRowMark *top_parentrc, LOCKMODE lockmode,
- List **appinfos, List **partitioned_child_rels,
- bool *part_cols_updated)
+ List **appinfos)
{
int i;
RangeTblEntry *childrte;
@@ -1717,8 +1691,8 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
* parentrte already has the root partrel's updatedCols translated to match
* the attribute ordering of parentrel.
*/
- if (!*part_cols_updated)
- *part_cols_updated =
+ if (!root->partColsUpdated)
+ root->partColsUpdated =
has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
/* First expand the partitioned table itself. */
@@ -1726,14 +1700,6 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
top_parentrc, parentrel,
appinfos, &childrte, &childRTindex);
- /*
- * The partitioned table does not have data for itself but still need to
- * be locked. Update given list of partitioned children with RTI of this
- * partitioned relation.
- */
- *partitioned_child_rels = lappend_int(*partitioned_child_rels,
- childRTindex);
-
for (i = 0; i < partdesc->nparts; i++)
{
Oid childOID = partdesc->oids[i];
@@ -1760,8 +1726,7 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
expand_partitioned_rtentry(root, childrte, childRTindex,
childrel, top_parentrc, lockmode,
- appinfos, partitioned_child_rels,
- part_cols_updated);
+ appinfos);
/* Close child relation, but keep locks */
heap_close(childrel, NoLock);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 8a6baa7bea..52e4cca49a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1171,7 +1171,6 @@ get_relation_constraints(PlannerInfo *root,
Index varno = rel->relid;
Relation relation;
TupleConstr *constr;
- List *pcqual;
/*
* We assume the relation has already been safely locked.
@@ -1257,24 +1256,34 @@ get_relation_constraints(PlannerInfo *root,
}
}
- /* Append partition predicates, if any */
- pcqual = RelationGetPartitionQual(relation);
- if (pcqual)
+ /*
+ * Append partition predicates, if any.
+ *
+ * For selects, partition pruning uses the parent table's partition bound
+ * descriptor, instead of constraint exclusion which is driven by the
+ * individual partition's partition constraint.
+ */
+ if (root->parse->commandType != CMD_SELECT)
{
- /*
- * Run the partition quals through const-simplification similar to
- * check constraints. We skip canonicalize_qual, though, because
- * partition quals should be in canonical form already; also, since
- * the qual is in implicit-AND format, we'd have to explicitly convert
- * it to explicit-AND format and back again.
- */
- pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
+ List *pcqual = RelationGetPartitionQual(relation);
- /* Fix Vars to have the desired varno */
- if (varno != 1)
- ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+ if (pcqual)
+ {
+ /*
+ * Run the partition quals through const-simplification similar to
+ * check constraints. We skip canonicalize_qual, though, because
+ * partition quals should be in canonical form already; also,
+ * since the qual is in implicit-AND format, we'd have to
+ * explicitly convert it to explicit-AND format and back again.
+ */
+ pcqual = (List *) eval_const_expressions(root, (Node *) pcqual);
- result = list_concat(result, pcqual);
+ /* Fix Vars to have the desired varno */
+ if (varno != 1)
+ ChangeVarNodes((Node *) pcqual, 1, varno, 0);
+
+ result = list_concat(result, pcqual);
+ }
}
heap_close(relation, NoLock);
@@ -1869,6 +1878,7 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
rel->nparts = partdesc->nparts;
set_baserel_partition_key_exprs(relation, rel);
+ rel->partition_qual = RelationGetPartitionQual(relation);
}
/*
@@ -1881,7 +1891,8 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
{
PartitionKey partkey = RelationGetPartitionKey(relation);
ListCell *lc;
- int partnatts;
+ int partnatts,
+ i;
PartitionScheme part_scheme;
/* A partitioned table should have a partition key. */
@@ -1899,7 +1910,7 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
partnatts != part_scheme->partnatts)
continue;
- /* Match the partition key types. */
+ /* Match partition key type properties. */
if (memcmp(partkey->partopfamily, part_scheme->partopfamily,
sizeof(Oid) * partnatts) != 0 ||
memcmp(partkey->partopcintype, part_scheme->partopcintype,
@@ -1917,6 +1928,19 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
Assert(memcmp(partkey->parttypbyval, part_scheme->parttypbyval,
sizeof(bool) * partnatts) == 0);
+ /*
+ * If partopfamily and partopcintype matched, must have the same
+ * partition comparison functions. Note that we cannot reliably
+ * Assert the equality of function structs themselves for they might
+ * be different across PartitionKey's, so just Assert for the function
+ * OIDs.
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (i = 0; i < partkey->partnatts; i++)
+ Assert(partkey->partsupfunc[i].fn_oid ==
+ part_scheme->partsupfunc[i].fn_oid);
+#endif
+
/* Found matching partition scheme. */
return part_scheme;
}
@@ -1951,6 +1975,12 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
memcpy(part_scheme->parttypbyval, partkey->parttypbyval,
sizeof(bool) * partnatts);
+ part_scheme->partsupfunc = (FmgrInfo *)
+ palloc(sizeof(FmgrInfo) * partnatts);
+ for (i = 0; i < partnatts; i++)
+ fmgr_info_copy(&part_scheme->partsupfunc[i], &partkey->partsupfunc[i],
+ CurrentMemoryContext);
+
/* Add the partitioning scheme to PlannerInfo. */
root->part_schemes = lappend(root->part_schemes, part_scheme);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index da8f0f93fc..b9aa7486ba 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -154,9 +154,11 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->part_scheme = NULL;
rel->nparts = 0;
rel->boundinfo = NULL;
+ rel->partition_qual = NIL;
rel->part_rels = NULL;
rel->partexprs = NULL;
rel->nullable_partexprs = NULL;
+ rel->partitioned_child_rels = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
@@ -567,9 +569,11 @@ build_join_rel(PlannerInfo *root,
joinrel->part_scheme = NULL;
joinrel->nparts = 0;
joinrel->boundinfo = NULL;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
/* Compute information relevant to the foreign relations. */
set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
@@ -734,9 +738,13 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->has_eclass_joins = false;
joinrel->top_parent_relids = NULL;
joinrel->part_scheme = NULL;
+ joinrel->nparts = 0;
+ joinrel->boundinfo = NULL;
+ joinrel->partition_qual = NIL;
joinrel->part_rels = NULL;
joinrel->partexprs = NULL;
joinrel->nullable_partexprs = NULL;
+ joinrel->partitioned_child_rels = NIL;
joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
inner_rel->top_parent_relids);
diff --git a/src/backend/partitioning/Makefile b/src/backend/partitioning/Makefile
new file mode 100644
index 0000000000..429207c4eb
--- /dev/null
+++ b/src/backend/partitioning/Makefile
@@ -0,0 +1,17 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for backend/partitioning
+#
+# IDENTIFICATION
+# src/backend/partitioning/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/partitioning
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = partprune.o
+
+include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
new file mode 100644
index 0000000000..cfde207df3
--- /dev/null
+++ b/src/backend/partitioning/partprune.c
@@ -0,0 +1,2780 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.c
+ * Parses clauses attempting to match them up to partition keys of a
+ * given relation and generates a set of "pruning steps", which can be
+ * later "executed" either from the planner or the executor to determine
+ * the minimum set of partitions which match the given clauses.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/partitioning/partprune.c
+ *
+ *-------------------------------------------------------------------------
+*/
+#include "postgres.h"
+
+#include "access/hash.h"
+#include "access/nbtree.h"
+#include "catalog/partition.h"
+#include "catalog/pg_operator.h"
+#include "catalog/pg_opfamily.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
+#include "optimizer/planner.h"
+#include "optimizer/predtest.h"
+#include "optimizer/prep.h"
+#include "parser/parse_coerce.h"
+#include "parser/parsetree.h"
+#include "partitioning/partprune.h"
+#include "partitioning/partbounds.h"
+#include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+
+/*
+ * Information about a clause matched with a partition key.
+ */
+typedef struct PartClauseInfo
+{
+ int keyno; /* Partition key number (0 to partnatts - 1) */
+ Oid opno; /* operator used to compare partkey to 'expr' */
+ bool op_is_ne; /* is clause's original operator <> ? */
+ Expr *expr; /* expr the partition key is compared to */
+ Oid cmpfn; /* Oid of function to compare 'expr' to the
+ * partition key */
+ int op_strategy; /* cached info. */
+} PartClauseInfo;
+
+/*
+ * PartClauseMatchStatus
+ * Describes the result match_clause_to_partition_key produces for a
+ * given clause and the partition key to match with that are passed to it
+ */
+typedef enum PartClauseMatchStatus
+{
+ PARTCLAUSE_NOMATCH,
+ PARTCLAUSE_MATCH_CLAUSE,
+ PARTCLAUSE_MATCH_NULLNESS,
+ PARTCLAUSE_MATCH_STEPS,
+ PARTCLAUSE_MATCH_CONTRADICT,
+ PARTCLAUSE_UNSUPPORTED
+} PartClauseMatchStatus;
+
+/*
+ * GeneratePruningStepsContext
+ * Information about the current state of generation of "pruning steps"
+ * for a given set of clauses
+ *
+ * gen_partprune_steps() initializes an instance of this struct, which is used
+ * throughout the step generation process.
+ */
+typedef struct GeneratePruningStepsContext
+{
+ int next_step_id;
+ List *steps;
+} GeneratePruningStepsContext;
+
+/* The result of performing one PartitionPruneStep */
+typedef struct PruneStepResult
+{
+ /*
+ * The offsets of bounds (in a table's boundinfo) whose partition is
+ * selected by the pruning step.
+ */
+ Bitmapset *bound_offsets;
+
+ bool scan_default; /* Scan the default partition? */
+ bool scan_null; /* Scan the partition for NULL values? */
+} PruneStepResult;
+
+
+static List *gen_partprune_steps_internal(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ List *clauses,
+ bool *constfalse);
+static PartitionPruneStep *gen_prune_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool op_is_ne,
+ List *exprs, List *cmpfns, Bitmapset *nullkeys);
+static PartitionPruneStep *gen_prune_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp);
+static PartitionPruneStep *gen_prune_steps_from_opexps(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses, Bitmapset *nullkeys);
+static PartClauseMatchStatus match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *clause_is_not_null,
+ PartClauseInfo **pc, List **clause_steps);
+static List *get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix);
+static List *get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns);
+static PruneStepResult *get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys);
+static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep);
+static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results);
+static bool match_boolean_partition_clause(Oid partopfamily, Expr *clause,
+ Expr *partkey, Expr **outconst);
+static bool partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value);
+
+
+/*
+ * gen_partprune_steps
+ * Process 'clauses' (a rel's baserestrictinfo list of clauses) and return
+ * a list of "partition pruning steps"
+ *
+ * If any of the clauses in the input list is a pseudo-constant "false",
+ * *constfalse is set to true upon return.
+ */
+List *
+gen_partprune_steps(RelOptInfo *rel, List *clauses, bool *constfalse)
+{
+ GeneratePruningStepsContext context;
+
+ context.next_step_id = 0;
+ context.steps = NIL;
+
+ /* The clauses list may be modified below, so better make a copy. */
+ clauses = list_copy(clauses);
+
+ /*
+ * For sub-partitioned tables there's a corner case where if the
+ * sub-partitioned table shares any partition keys with its parent, then
+ * it's possible that the partitioning hierarchy allows the parent
+ * partition to only contain a narrower range of values than the
+ * sub-partitioned table does. In this case it is possible that we'd
+ * include partitions that could not possibly have any tuples matching
+ * 'clauses'. The possibility of such a partition arrangement is perhaps
+ * unlikely for non-default partitions, but it may be more likely in the
+ * case of default partitions, so we'll add the parent partition table's
+ * partition qual to the clause list in this case only. This may result
+ * in the default partition being eliminated.
+ */
+ if (partition_bound_has_default(rel->boundinfo) &&
+ rel->partition_qual != NIL)
+ {
+ List *partqual = rel->partition_qual;
+
+ partqual = (List *) expression_planner((Expr *) partqual);
+
+ /* Fix Vars to have the desired varno */
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);
+
+ clauses = list_concat(clauses, partqual);
+ }
+
+ /* Down into the rabbit-hole. */
+ *constfalse = false;
+ gen_partprune_steps_internal(rel, &context, clauses, constfalse);
+
+ return context.steps;
+}
+
+/*
+ * prune_append_rel_partitions
+ * Returns RT indexes of the minimum set of child partitions which must
+ * be scanned to satisfy rel's baserestrictinfo quals.
+ *
+ * Callers must ensure that 'rel' is a partitioned table.
+ */
+Relids
+prune_append_rel_partitions(RelOptInfo *rel)
+{
+ Relids result;
+ List *clauses = rel->baserestrictinfo;
+ List *pruning_steps;
+ bool constfalse;
+ PartitionPruneContext context;
+ Bitmapset *partindexes;
+ int i;
+
+ Assert(clauses != NIL);
+ Assert(rel->part_scheme != NULL);
+
+ /* Quick exit. */
+ if (rel->nparts == 0)
+ return NULL;
+
+ /* process clauses */
+ pruning_steps = gen_partprune_steps(rel, clauses, &constfalse);
+ if (constfalse)
+ return NULL;
+
+ context.strategy = rel->part_scheme->strategy;
+ context.partnatts = rel->part_scheme->partnatts;
+ context.partopfamily = rel->part_scheme->partopfamily;
+ context.partopcintype = rel->part_scheme->partopcintype;
+ context.partcollation = rel->part_scheme->partcollation;
+ context.partsupfunc = rel->part_scheme->partsupfunc;
+ context.nparts = rel->nparts;
+ context.boundinfo = rel->boundinfo;
+
+ /* Actual pruning happens here. */
+ partindexes = get_matching_partitions(&context, pruning_steps);
+
+ /* Add selected partitions' RT indexes to result. */
+ i = -1;
+ result = NULL;
+ while ((i = bms_next_member(partindexes, i)) >= 0)
+ result = bms_add_member(result, rel->part_rels[i]->relid);
+
+ return result;
+}
+
+/*
+ * get_matching_partitions
+ * Determine partitions that survive partition pruning
+ *
+ * Returns a Bitmapset of indexes of surviving partitions.
+ */
+Bitmapset *
+get_matching_partitions(PartitionPruneContext *context, List *pruning_steps)
+{
+ Bitmapset *result;
+ int num_steps = list_length(pruning_steps),
+ i;
+ PruneStepResult **results,
+ *final_result;
+ ListCell *lc;
+
+ /* If there are no pruning steps then all partitions match. */
+ if (num_steps == 0)
+ return bms_add_range(NULL, 0, context->nparts - 1);
+
+ /*
+ * Allocate space for individual pruning steps to store its result. Each
+ * slot will hold a PruneStepResult after performing a given pruning step.
+ * Later steps may use the result of one or more earlier steps. The
+ * result of applying all pruning steps is the value contained in the slot
+ * of the last pruning step.
+ */
+ results = (PruneStepResult **)
+ palloc0(num_steps * sizeof(PruneStepResult *));
+ foreach(lc, pruning_steps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ switch (nodeTag(step))
+ {
+ case T_PartitionPruneStepOp:
+ results[step->step_id] =
+ perform_pruning_base_step(context,
+ (PartitionPruneStepOp *) step);
+ break;
+
+ case T_PartitionPruneStepCombine:
+ results[step->step_id] =
+ perform_pruning_combine_step(context,
+ (PartitionPruneStepCombine *) step,
+ results);
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning step type: %d",
+ (int) nodeTag(step));
+ }
+ }
+
+ /*
+ * At this point we know the offsets of all the datums whose corresponding
+ * partitions need to be in the result, including special null-accepting
+ * and default partitions. Collect the actual partition indexes now.
+ */
+ final_result = results[num_steps - 1];
+ Assert(final_result != NULL);
+ i = -1;
+ result = NULL;
+ while ((i = bms_next_member(final_result->bound_offsets, i)) >= 0)
+ {
+ int partindex = context->boundinfo->indexes[i];
+
+ /*
+ * In range and hash partitioning cases, some slots may contain -1,
+ * indicating that no partition has been defined to accept a given
+ * range of data or for a given remainder, respectively. The default
+ * partition, if any, in case of range partitioning, will be added to
+ * the result, because the specified range still satisfies the query's
+ * conditions.
+ */
+ if (partindex >= 0)
+ result = bms_add_member(result, partindex);
+ }
+
+ /* Add the null and/or default partition if needed and if present. */
+ if (final_result->scan_null)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(partition_bound_accepts_nulls(context->boundinfo));
+ result = bms_add_member(result, context->boundinfo->null_index);
+ }
+ if (final_result->scan_default)
+ {
+ Assert(context->strategy == PARTITION_STRATEGY_LIST ||
+ context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(partition_bound_has_default(context->boundinfo));
+ result = bms_add_member(result, context->boundinfo->default_index);
+ }
+
+ return result;
+}
+
+/*
+ * gen_partprune_steps_internal
+ * Processes 'clauses' to generate partition pruning steps.
+ *
+ * From OpExpr clauses that are mutually AND'd, we find combinations of those
+ * that match to the partition key columns and for every such combination,
+ * we emit a PartitionPruneStepOp containing a vector of expressions whose
+ * values are used as a look up key to search partitions by comparing the
+ * values with partition bounds. Relevant details of the operator and a
+ * vector of (possibly cross-type) comparison functions is also included with
+ * each step.
+ *
+ * For BoolExpr clauses, we recursively generate steps for each argument, and
+ * return a PartitionPruneStepCombine of their results.
+ *
+ * The generated steps are added to the context's steps list. Each step is
+ * assigned a unique step identifier, across recursive calls.
+ *
+ * If we find clauses that are mutually contradictory, or a pseudoconstant
+ * clause that contains false, we set *constfalse to true and return NIL (no
+ * pruning steps). Caller should consider all partitions as pruned in that
+ * case.
+ *
+ * Note: the 'clauses' List may be modified inside this function. Callers may
+ * like to make a copy of it before passing them to this function.
+ */
+static List *
+gen_partprune_steps_internal(RelOptInfo *rel, GeneratePruningStepsContext *context,
+ List *clauses, bool *constfalse)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ List *keyclauses[PARTITION_MAX_KEYS];
+ Bitmapset *nullkeys = NULL,
+ *notnullkeys = NULL;
+ bool generate_opsteps = false;
+ List *result = NIL;
+ ListCell *lc;
+
+ memset(keyclauses, 0, sizeof(keyclauses));
+ foreach(lc, clauses)
+ {
+ Expr *clause = (Expr *) lfirst(lc);
+ int i;
+
+ if (IsA(clause, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) clause;
+
+ clause = rinfo->clause;
+ if (rinfo->pseudoconstant &&
+ IsA(rinfo->clause, Const) &&
+ !DatumGetBool(((Const *) clause)->constvalue))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ }
+
+ /* Get the BoolExpr's out of the way. */
+ if (IsA(clause, BoolExpr))
+ {
+ /*
+ * Generate steps for arguments.
+ *
+ * While steps generated for the arguments themselves will be
+ * added to context->steps during recursion and will be evaluated
+ * independently, collect their step IDs to be stored in the
+ * combine step we'll be creating.
+ */
+ if (or_clause((Node *) clause))
+ {
+ List *arg_stepids = NIL;
+ bool all_args_constfalse = true;
+ ListCell *lc1;
+
+ /*
+ * Get pruning step for each arg. If we get constfalse for
+ * all args, it means the OR expression is false as a whole.
+ */
+ foreach(lc1, ((BoolExpr *) clause)->args)
+ {
+ Expr *arg = lfirst(lc1);
+ bool arg_constfalse;
+ List *argsteps;
+
+ arg_constfalse = false;
+ argsteps =
+ gen_partprune_steps_internal(rel, context,
+ list_make1(arg),
+ &arg_constfalse);
+ if (!arg_constfalse)
+ all_args_constfalse = false;
+
+ if (argsteps != NIL)
+ {
+ PartitionPruneStep *step;
+
+ Assert(list_length(argsteps) == 1);
+ step = (PartitionPruneStep *) linitial(argsteps);
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+ else
+ {
+ /*
+ * No steps either means that arg_constfalse is true
+ * or the arg didn't contain a clause matching this
+ * partition key.
+ *
+ * In case of the latter, we cannot prune using such
+ * an arg. To indicate that to the pruning code, we
+ * must construct a dummy PartitionPruneStepCombine
+ * whose source_stepids is set to an empty List.
+ * However, if we can prove using constraint exclusion
+ * that the clause refutes the table's partition
+ * constraint (if it's sub-partitioned), we need not
+ * bother with that. That is, we effectively ignore
+ * this OR arm.
+ */
+ List *partconstr = rel->partition_qual;
+ PartitionPruneStep *orstep;
+
+ /* Just ignore this argument. */
+ if (arg_constfalse)
+ continue;
+
+ if (partconstr)
+ {
+ partconstr = (List *)
+ expression_planner((Expr *) partconstr);
+ if (rel->relid != 1)
+ ChangeVarNodes((Node *) partconstr, 1,
+ rel->relid, 0);
+ if (predicate_refuted_by(partconstr,
+ list_make1(arg),
+ false))
+ continue;
+ }
+
+ orstep = gen_prune_step_combine(context, NIL,
+ PARTPRUNE_COMBINE_UNION);
+ arg_stepids = lappend_int(arg_stepids, orstep->step_id);
+ }
+ }
+
+ *constfalse = all_args_constfalse;
+
+ /* Check if any contradicting clauses were found */
+ if (*constfalse)
+ return NIL;
+
+ if (arg_stepids != NIL)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_combine(context, arg_stepids,
+ PARTPRUNE_COMBINE_UNION);
+ result = lappend(result, step);
+ }
+ continue;
+ }
+ else if (and_clause((Node *) clause))
+ {
+ List *args = ((BoolExpr *) clause)->args;
+ List *argsteps,
+ *arg_stepids = NIL;
+ ListCell *lc1;
+
+ /*
+ * args may itself contain clauses of arbitrary type, so just
+ * recurse and later combine the component partitions sets
+ * using a combine step.
+ */
+ *constfalse = false;
+ argsteps = gen_partprune_steps_internal(rel, context, args,
+ constfalse);
+ if (*constfalse)
+ return NIL;
+
+ foreach(lc1, argsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc1);
+
+ arg_stepids = lappend_int(arg_stepids, step->step_id);
+ }
+
+ if (arg_stepids)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_combine(context, arg_stepids,
+ PARTPRUNE_COMBINE_INTERSECT);
+ result = lappend(result, step);
+ }
+ continue;
+ }
+
+ /*
+ * Fall-through for a NOT clause, which if it's a Boolean clause,
+ * will be handled in match_clause_to_partition_key(). We
+ * currently don't perform any pruning for more complex NOT
+ * clauses.
+ */
+ }
+
+ /*
+ * Must be a clause for which we can check if one of its args matches
+ * the partition key.
+ */
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ Expr *partkey = linitial(rel->partexprs[i]);
+ bool clause_is_not_null = false;
+ PartClauseInfo *pc = NULL;
+ List *clause_steps = NIL;
+
+ switch (match_clause_to_partition_key(rel, context,
+ clause, partkey, i,
+ &clause_is_not_null,
+ &pc, &clause_steps))
+ {
+ case PARTCLAUSE_MATCH_CLAUSE:
+ Assert(pc != NULL);
+
+ /*
+ * Since we only allow strict operators, check for any
+ * contradicting IS NULL.
+ */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ generate_opsteps = true;
+ keyclauses[i] = lappend(keyclauses[i], pc);
+ break;
+
+ case PARTCLAUSE_MATCH_NULLNESS:
+ if (!clause_is_not_null)
+ {
+ /* check for conflicting IS NOT NULL */
+ if (bms_is_member(i, notnullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ nullkeys = bms_add_member(nullkeys, i);
+ }
+ else
+ {
+ /* check for conflicting IS NULL */
+ if (bms_is_member(i, nullkeys))
+ {
+ *constfalse = true;
+ return NIL;
+ }
+ notnullkeys = bms_add_member(notnullkeys, i);
+ }
+ break;
+
+ case PARTCLAUSE_MATCH_STEPS:
+ Assert(clause_steps != NIL);
+ result = list_concat(result, clause_steps);
+ break;
+
+ case PARTCLAUSE_MATCH_CONTRADICT:
+ /* We've nothing more to do if a contradiction was found. */
+ *constfalse = true;
+ return NIL;
+
+ case PARTCLAUSE_NOMATCH:
+
+ /*
+ * Clause didn't match this key, but it might match the
+ * next one.
+ */
+ continue;
+
+ case PARTCLAUSE_UNSUPPORTED:
+ /* This clause cannot be used for pruning. */
+ break;
+
+ default:
+ Assert(false);
+ break;
+ }
+
+ /* done; go check the next clause. */
+ break;
+ }
+ }
+
+ /*
+ * If generate_opsteps is set to false it means no OpExprs were directly
+ * present in the input list.
+ */
+ if (!generate_opsteps)
+ {
+ /*
+ * Generate one prune step for the information derived from IS NULL,
+ * if any. To prune hash partitions, we must have found IS NULL
+ * clauses for all partition keys.
+ */
+ if (!bms_is_empty(nullkeys) &&
+ (part_scheme->strategy != PARTITION_STRATEGY_HASH ||
+ bms_num_members(nullkeys) == part_scheme->partnatts))
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_op(context, 0, false, NIL, NIL,
+ nullkeys);
+ result = lappend(result, step);
+ }
+
+ /*
+ * Note that for IS NOT NULL clauses, simply having step suffices;
+ * there is no need to propagate the exact details of which keys are
+ * required to be NOT NULL. Hash partitioning expects to see actual
+ * values to perform any pruning.
+ */
+ if (!bms_is_empty(notnullkeys) &&
+ part_scheme->strategy != PARTITION_STRATEGY_HASH)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_op(context, 0, false, NIL, NIL, NULL);
+ result = lappend(result, step);
+ }
+ }
+ else
+ {
+ PartitionPruneStep *step;
+
+ /* Generate pruning steps from OpExpr clauses in keyclauses. */
+ step = gen_prune_steps_from_opexps(part_scheme, context,
+ keyclauses, nullkeys);
+ if (step != NULL)
+ result = lappend(result, step);
+ }
+
+ /*
+ * Finally, results from all entries appearing in result should be
+ * combined using an INTERSECT combine step, if more than one.
+ */
+ if (list_length(result) > 1)
+ {
+ List *step_ids = NIL;
+
+ foreach(lc, result)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ step_ids = lappend_int(step_ids, step->step_id);
+ }
+
+ if (step_ids != NIL)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_combine(context, step_ids,
+ PARTPRUNE_COMBINE_INTERSECT);
+ result = lappend(result, step);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Generate a pruning step for a specific operator.
+ *
+ * The step is assigned a unique step identifier and added to context's 'steps'
+ * list.
+ */
+static PartitionPruneStep *
+gen_prune_step_op(GeneratePruningStepsContext *context,
+ int opstrategy, bool op_is_ne,
+ List *exprs, List *cmpfns,
+ Bitmapset *nullkeys)
+{
+ PartitionPruneStepOp *opstep = makeNode(PartitionPruneStepOp);
+
+ opstep->step.step_id = context->next_step_id++;
+
+ /*
+ * For clauses that contain an <> operator, set opstrategy to
+ * InvalidStrategy to signal get_matching_list_bounds to do the right
+ * thing.
+ */
+ if (op_is_ne)
+ {
+ Assert(opstrategy == BTEqualStrategyNumber);
+ opstep->opstrategy = InvalidStrategy;
+ }
+ else
+ opstep->opstrategy = opstrategy;
+ Assert(list_length(exprs) == list_length(cmpfns));
+ opstep->exprs = exprs;
+ opstep->cmpfns = cmpfns;
+ opstep->nullkeys = nullkeys;
+
+ context->steps = lappend(context->steps, opstep);
+
+ return (PartitionPruneStep *) opstep;
+}
+
+/*
+ * Generate a pruning step for a combination of several other steps.
+ *
+ * The step is assigned a unique step identifier and added to context's
+ * 'steps' list.
+ */
+static PartitionPruneStep *
+gen_prune_step_combine(GeneratePruningStepsContext *context,
+ List *source_stepids,
+ PartitionPruneCombineOp combineOp)
+{
+ PartitionPruneStepCombine *cstep = makeNode(PartitionPruneStepCombine);
+
+ cstep->step.step_id = context->next_step_id++;
+ cstep->combineOp = combineOp;
+ cstep->source_stepids = source_stepids;
+
+ context->steps = lappend(context->steps, cstep);
+
+ return (PartitionPruneStep *) cstep;
+}
+
+/*
+ * gen_prune_steps_from_opexps
+ *
+ * 'keyclauses' contains one list of clauses per partition key. We check here
+ * if we have found clauses for a valid subset of the partition key. In some
+ * cases, (depending on the type of partitioning being used) if we didn't
+ * find clauses for a given key, we discard clauses that may have been
+ * found for any subsequent keys; see specific notes below.
+ */
+static PartitionPruneStep *
+gen_prune_steps_from_opexps(PartitionScheme part_scheme,
+ GeneratePruningStepsContext *context,
+ List **keyclauses, Bitmapset *nullkeys)
+{
+ ListCell *lc;
+ List *opsteps = NIL;
+ List *btree_clauses[BTMaxStrategyNumber],
+ *hash_clauses[HTMaxStrategyNumber];
+ bool need_next_less,
+ need_next_eq,
+ need_next_greater;
+ int i;
+
+ memset(btree_clauses, 0, sizeof(btree_clauses));
+ memset(hash_clauses, 0, sizeof(hash_clauses));
+ for (i = 0; i < part_scheme->partnatts; i++)
+ {
+ List *clauselist = keyclauses[i];
+ bool consider_next_key = true;
+
+ /*
+ * To be useful for pruning, we must have clauses for a prefix of
+ * partition keys in the case of range partitioning. So, ignore
+ * clauses for keys after this one.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ clauselist == NIL)
+ break;
+
+ /*
+ * For hash partitioning, if a column doesn't have the necessary
+ * equality clause, there should be an IS NULL clause, otherwise
+ * pruning is not possible.
+ */
+ if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
+ clauselist == NIL && !bms_is_member(i, nullkeys))
+ return NULL;
+
+ need_next_eq = need_next_less = need_next_greater = true;
+ foreach(lc, clauselist)
+ {
+ PartClauseInfo *pc = (PartClauseInfo *) lfirst(lc);
+ Oid lefttype,
+ righttype;
+
+ /* Look up the operator's btree/hash strategy number. */
+ if (pc->op_strategy == InvalidStrategy)
+ get_op_opfamily_properties(pc->opno,
+ part_scheme->partopfamily[i],
+ false,
+ &pc->op_strategy,
+ &lefttype,
+ &righttype);
+
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ PartClauseInfo *last = NULL;
+ bool inclusive = false;
+
+ /*
+ * Add this clause to the list of clauses to be used
+ * for pruning if this is the first such key for this
+ * operator strategy or if it is consecutively next to
+ * the last column for which a clause with this
+ * operator strategy was matched.
+ */
+ if (btree_clauses[pc->op_strategy - 1] != NIL)
+ last = llast(btree_clauses[pc->op_strategy - 1]);
+
+ if (last == NULL ||
+ i == last->keyno || i == last->keyno + 1)
+ btree_clauses[pc->op_strategy - 1] =
+ lappend(btree_clauses[pc->op_strategy - 1], pc);
+
+ /*
+ * We may not need the next clause if they're of
+ * certain strategy.
+ */
+ switch (pc->op_strategy)
+ {
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_less = false;
+ break;
+ case BTEqualStrategyNumber:
+ /* always accept clauses for the next key. */
+ break;
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ if (!inclusive)
+ need_next_eq = need_next_greater = false;
+ break;
+ }
+
+ /* We may want to change our mind. */
+ if (consider_next_key)
+ consider_next_key = (need_next_eq ||
+ need_next_less ||
+ need_next_greater);
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ if (pc->op_strategy != HTEqualStrategyNumber)
+ elog(ERROR, "invalid clause for hash partitioning");
+ hash_clauses[pc->op_strategy - 1] =
+ lappend(hash_clauses[pc->op_strategy - 1], pc);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+ }
+
+ /*
+ * If we've decided that clauses for subsequent partition keys
+ * wouldn't be useful for pruning, don't search any further.
+ */
+ if (!consider_next_key)
+ break;
+ }
+
+ /*
+ * Now, we have divided clauses according to their operator strategies.
+ * Check for each strategy if we can generate pruning step(s) by
+ * collecting a list of expressions whose values will constitute a vector
+ * that can be used as a lookup key by a partition bound searching
+ * function.
+ */
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ {
+ List *eq_clauses = btree_clauses[BTEqualStrategyNumber - 1];
+ List *le_clauses = btree_clauses[BTLessEqualStrategyNumber - 1];
+ List *ge_clauses = btree_clauses[BTGreaterEqualStrategyNumber - 1];
+
+ /*
+ * For each clause under consideration for a given strategy,
+ * we collect expressions from clauses for earlier keys, whose
+ * operator strategy is inclusive, into a list called
+ * 'prefix'. By appending the clause's own expression to the
+ * 'prefix', we'll generate one step using the so generated
+ * vector and assign the current strategy to it. Actually,
+ * 'prefix' might contain multiple clauses for the same key,
+ * in which case, we must generate steps for various
+ * combinations of expressions of different keys, which
+ * get_steps_using_prefix takes care of for us.
+ */
+ for (i = 0; i < BTMaxStrategyNumber; i++)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+
+ foreach(lc, btree_clauses[i])
+ {
+ ListCell *lc1;
+ List *prefix = NIL;
+
+ /* Clause under consideration. */
+ pc = lfirst(lc);
+
+ /*
+ * Expressions from = clauses can always be in the
+ * prefix, provided they're from an earlier key.
+ */
+ foreach(lc1, eq_clauses)
+ {
+ PartClauseInfo *eqpc = lfirst(lc1);
+
+ if (eqpc->keyno == pc->keyno)
+ break;
+ if (eqpc->keyno < pc->keyno)
+ prefix = lappend(prefix, eqpc);
+ }
+
+ /*
+ * If we're generating steps for </<= strategy, we can
+ * add other <= clauses to the prefix, provided
+ * they're from an earlier key.
+ */
+ if (i == BTLessStrategyNumber - 1 ||
+ i == BTLessEqualStrategyNumber - 1)
+ {
+ foreach(lc1, le_clauses)
+ {
+ PartClauseInfo *lepc = lfirst(lc1);
+
+ if (lepc->keyno == pc->keyno)
+ break;
+ if (lepc->keyno < pc->keyno)
+ prefix = lappend(prefix, lepc);
+ }
+ }
+
+ /*
+ * If we're generating steps for >/>= strategy, we can
+ * add other >= clauses to the prefix, provided
+ * they're from an earlier key.
+ */
+ if (i == BTGreaterStrategyNumber - 1 ||
+ i == BTGreaterEqualStrategyNumber - 1)
+ {
+ foreach(lc1, ge_clauses)
+ {
+ PartClauseInfo *gepc = lfirst(lc1);
+
+ if (gepc->keyno == pc->keyno)
+ break;
+ if (gepc->keyno < pc->keyno)
+ prefix = lappend(prefix, gepc);
+ }
+ }
+
+ /*
+ * As mentioned above, if 'prefix' contains multiple
+ * expressions for the same key, the following will
+ * generate multiple steps, one for each combination
+ * of the expressions for different keys.
+ *
+ * Note that we pass NULL for step_nullkeys, because
+ * we don't search list/range partition bounds where
+ * some keys are NULL.
+ */
+ Assert(pc->op_strategy == i + 1);
+ pc_steps = get_steps_using_prefix(context, i + 1,
+ pc->op_is_ne,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ NULL,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ case PARTITION_STRATEGY_HASH:
+ {
+ List *eq_clauses = hash_clauses[HTEqualStrategyNumber - 1];
+
+ /* For hash partitioning, we have just the = strategy. */
+ if (eq_clauses != NIL)
+ {
+ PartClauseInfo *pc;
+ List *pc_steps;
+ List *prefix = NIL;
+ int last_keyno;
+ ListCell *lc1;
+
+ /*
+ * Locate the clause for the greatest column. This may
+ * not belong to the last partition key, but it is the
+ * clause belonging to the last partition key we found a
+ * clause for above.
+ */
+ pc = llast(eq_clauses);
+
+ /*
+ * There might be multiple clauses which matched to that
+ * partition key; find the first such clause. While at
+ * it, add all the clauses before that one to 'prefix'.
+ */
+ last_keyno = pc->keyno;
+ foreach(lc, eq_clauses)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == last_keyno)
+ break;
+ prefix = lappend(prefix, pc);
+ }
+
+ /*
+ * For each clause for the "last" column, after appending
+ * the clause's own expression to the 'prefix', we'll
+ * generate one step using the so generated vector and and
+ * assign = as its strategy. Actually, 'prefix' might
+ * contain multiple clauses for the same key, in which
+ * case, we must generate steps for various combinations
+ * of expressions of different keys, which
+ * get_steps_using_prefix will take care of for us.
+ */
+ for_each_cell(lc1, lc)
+ {
+ pc = lfirst(lc1);
+
+ /*
+ * Note that we pass nullkeys for step_nullkeys,
+ * because we need to tell hash partition bound search
+ * function which of the keys we found IS NULL clauses
+ * for.
+ */
+ Assert(pc->op_strategy == HTEqualStrategyNumber);
+ pc_steps =
+ get_steps_using_prefix(context,
+ HTEqualStrategyNumber,
+ false,
+ pc->expr,
+ pc->cmpfn,
+ pc->keyno,
+ nullkeys,
+ prefix);
+ opsteps = list_concat(opsteps, list_copy(pc_steps));
+ }
+ }
+ break;
+ }
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* Finally, add a combine step to mutualy AND opsteps, if needed. */
+ if (list_length(opsteps) > 1)
+ {
+ List *opstep_ids = NIL;
+
+ foreach(lc, opsteps)
+ {
+ PartitionPruneStep *step = lfirst(lc);
+
+ opstep_ids = lappend_int(opstep_ids, step->step_id);
+ }
+
+ if (opstep_ids != NIL)
+ return gen_prune_step_combine(context, opstep_ids,
+ PARTPRUNE_COMBINE_INTERSECT);
+ return NULL;
+ }
+ else if (opsteps != NIL)
+ return linitial(opsteps);
+
+ return NULL;
+}
+
+/*
+ * If the partition key has a collation, then the clause must have the same
+ * input collation. If the partition key is non-collatable, we assume the
+ * collation doesn't matter, because while collation wasn't considered when
+ * performing partitioning, the clause still may have a collation assigned
+ * due to the other input being of a collatable type.
+ *
+ * See also IndexCollMatchesExprColl.
+ */
+#define PartCollMatchesExprColl(partcoll, exprcoll) \
+ ((partcoll) == InvalidOid || (partcoll) == (exprcoll))
+
+/*
+ * match_clause_to_partition_key
+ * Attempt to match the given 'clause' with the specified partition key.
+ *
+ * Return value is:
+ * * PARTCLAUSE_NOMATCH if the clause doesn't match this partition key (but
+ * caller should keep trying, because it might match a subsequent key).
+ * Output arguments: none set.
+ *
+ * * PARTCLAUSE_MATCH_CLAUSE if there is a match.
+ * Output arguments: *pc is set to a PartClauseInfo constructed for the
+ * matched clause.
+ *
+ * * PARTCLAUSE_MATCH_NULLNESS if there is a match, and the matched clause was
+ * either a "a IS NULL" or "a IS NOT NULL" clause.
+ * Output arguments: *clause_is_not_null is set to false in the former case
+ * true otherwise.
+ *
+ * * PARTCLAUSE_MATCH_STEPS if there is a match.
+ * Output arguments: *clause_steps is set to a list of PartitionPruneStep
+ * generated for the clause.
+ *
+ * * PARTCLAUSE_MATCH_CONTRADICT if the clause is self-contradictory. This can
+ * only happen if it's a BoolExpr whose arguments are self-contradictory.
+ * Output arguments: none set.
+ *
+ * * PARTCLAUSE_UNSUPPORTED if the clause cannot be used for pruning at all
+ * due to one of its properties, such as argument volatility, even if it may
+ * have been matched with a key.
+ * Output arguments: none set.
+ */
+static PartClauseMatchStatus
+match_clause_to_partition_key(RelOptInfo *rel,
+ GeneratePruningStepsContext *context,
+ Expr *clause, Expr *partkey, int partkeyidx,
+ bool *clause_is_not_null, PartClauseInfo **pc,
+ List **clause_steps)
+{
+ PartitionScheme part_scheme = rel->part_scheme;
+ Expr *expr;
+ Oid partopfamily = part_scheme->partopfamily[partkeyidx],
+ partcoll = part_scheme->partcollation[partkeyidx];
+
+ /*
+ * Recognize specially shaped clauses that match with the Boolean
+ * partition key.
+ */
+ if (match_boolean_partition_clause(partopfamily, clause, partkey, &expr))
+ {
+ PartClauseInfo *partclause;
+
+ partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
+ partclause->keyno = partkeyidx;
+ /* Do pruning with the Boolean equality operator. */
+ partclause->opno = BooleanEqualOperator;
+ partclause->op_is_ne = false;
+ partclause->expr = expr;
+ /* We know that expr is of Boolean type. */
+ partclause->cmpfn = rel->part_scheme->partsupfunc[partkeyidx].fn_oid;
+ partclause->op_strategy = InvalidStrategy;
+
+ *pc = partclause;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, OpExpr) &&
+ list_length(((OpExpr *) clause)->args) == 2)
+ {
+ OpExpr *opclause = (OpExpr *) clause;
+ Expr *leftop,
+ *rightop;
+ Oid commutator = InvalidOid,
+ negator = InvalidOid;
+ Oid cmpfn;
+ Oid exprtype;
+ bool is_opne_listp = false;
+ PartClauseInfo *partclause;
+
+ leftop = (Expr *) get_leftop(clause);
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+ rightop = (Expr *) get_rightop(clause);
+ if (IsA(rightop, RelabelType))
+ rightop = ((RelabelType *) rightop)->arg;
+
+ /* check if the clause matches this partition key */
+ if (equal(leftop, partkey))
+ expr = rightop;
+ else if (equal(rightop, partkey))
+ {
+ expr = leftop;
+ commutator = get_commutator(opclause->opno);
+
+ /* nothing we can do unless we can swap the operands */
+ if (!OidIsValid(commutator))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ /* clause does not match this partition key, but perhaps next. */
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Partition key also consists of a collation that's specified for it,
+ * so try to match it too. There may be multiple keys with the same
+ * expression but different collations.
+ */
+ if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Now check various properties of the clause
+ * to see if it's sane to use it for pruning. If any of the
+ * properties makes it unsuitable for pruning, then the clause is
+ * useless no matter which key it's matched to.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(opclause->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* We can't use any volatile expressions to prune partitions. */
+ if (contain_volatile_functions((Node *) expr))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * Normally we only bother with operators that are listed as being
+ * part of the partitioning operator family. But we make an exception
+ * in one case -- operators named '<>' are not listed in any operator
+ * family whatsoever, in which case, we try to perform partition
+ * pruning with it only if list partitioning is in use.
+ */
+ if (!op_in_opfamily(opclause->opno, partopfamily))
+ {
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * To confirm if the operator is really '<>', check if its negator
+ * is a btree equality operator.
+ */
+ negator = get_negator(opclause->opno);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ Oid lefttype;
+ Oid righttype;
+ int strategy;
+
+ get_op_opfamily_properties(negator, partopfamily, false,
+ &strategy, &lefttype, &righttype);
+
+ if (strategy == BTEqualStrategyNumber)
+ is_opne_listp = true;
+ }
+
+ /* Operator isn't really what we were hoping it'd be. */
+ if (!is_opne_listp)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+
+ /* Check if we're going to need a cross-type comparison function. */
+ exprtype = exprType((Node *) expr);
+ if (exprtype != part_scheme->partopcintype[partkeyidx])
+ {
+ switch (part_scheme->strategy)
+ {
+ case PARTITION_STRATEGY_LIST:
+ case PARTITION_STRATEGY_RANGE:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ part_scheme->partopcintype[partkeyidx],
+ exprtype, BTORDER_PROC);
+ break;
+
+ case PARTITION_STRATEGY_HASH:
+ cmpfn =
+ get_opfamily_proc(part_scheme->partopfamily[partkeyidx],
+ exprtype, exprtype, HASHEXTENDED_PROC);
+ break;
+
+ default:
+ elog(ERROR, "invalid partition strategy: %c",
+ part_scheme->strategy);
+ break;
+ }
+
+ /* If we couldn't find one, we cannot use this expression. */
+ if (!OidIsValid(cmpfn))
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ else
+ cmpfn = part_scheme->partsupfunc[partkeyidx].fn_oid;
+
+ partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
+ partclause->keyno = partkeyidx;
+
+ /* For <> operator clauses, pass on the negator. */
+ partclause->op_is_ne = false;
+ partclause->op_strategy = InvalidStrategy;
+
+ if (is_opne_listp)
+ {
+ Assert(OidIsValid(negator));
+ partclause->opno = negator;
+ partclause->op_is_ne = true;
+
+ /*
+ * We already know the strategy in this case, so may as well set
+ * it rather than having to look it up later.
+ */
+ partclause->op_strategy = BTEqualStrategyNumber;
+ }
+ /* And if commuted before matching, pass on the commutator */
+ else if (OidIsValid(commutator))
+ partclause->opno = commutator;
+ else
+ partclause->opno = opclause->opno;
+
+ partclause->expr = expr;
+ partclause->cmpfn = cmpfn;
+
+ *pc = partclause;
+
+ return PARTCLAUSE_MATCH_CLAUSE;
+ }
+ else if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Oid saop_op = saop->opno;
+ Oid saop_coll = saop->inputcollid;
+ Expr *leftop = (Expr *) linitial(saop->args),
+ *rightop = (Expr *) lsecond(saop->args);
+ List *elem_exprs,
+ *elem_clauses;
+ ListCell *lc1;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Check it matches this partition key */
+ if (!equal(leftop, partkey) ||
+ !PartCollMatchesExprColl(partcoll, saop->inputcollid))
+ return PARTCLAUSE_NOMATCH;
+
+ /*
+ * Matched with this key. Check various properties of the clause to
+ * see if it can sanely be used for partition pruning.
+ */
+
+ /*
+ * Only allow strict operators. This will guarantee nulls are
+ * filtered.
+ */
+ if (!op_strict(saop->opno))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /* Useless if the array has any volatile functions. */
+ if (contain_volatile_functions((Node *) rightop))
+ return PARTCLAUSE_UNSUPPORTED;
+
+ /*
+ * In case of NOT IN (..), we get a '<>', which we handle if list
+ * partitioning is in use and we're able to confirm that it's negator
+ * is a btree equality operator belonging to the partitioning operator
+ * family.
+ */
+ if (!op_in_opfamily(saop_op, partopfamily))
+ {
+ Oid negator;
+
+ if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ negator = get_negator(saop_op);
+ if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
+ {
+ int strategy;
+ Oid lefttype,
+ righttype;
+
+ get_op_opfamily_properties(negator, partopfamily,
+ false, &strategy,
+ &lefttype, &righttype);
+ if (strategy != BTEqualStrategyNumber)
+ return PARTCLAUSE_UNSUPPORTED;
+ }
+ }
+
+ /*
+ * First generate a list of Const nodes, one for each array element
+ * (excepting nulls).
+ */
+ elem_exprs = NIL;
+ if (IsA(rightop, Const))
+ {
+ Const *arr = castNode(Const, rightop);
+ ArrayType *arrval = DatumGetArrayTypeP(arr->constvalue);
+ int16 elemlen;
+ bool elembyval;
+ char elemalign;
+ Datum *elem_values;
+ bool *elem_nulls;
+ int num_elems,
+ i;
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arrval),
+ &elemlen, &elembyval, &elemalign);
+ deconstruct_array(arrval,
+ ARR_ELEMTYPE(arrval),
+ elemlen, elembyval, elemalign,
+ &elem_values, &elem_nulls,
+ &num_elems);
+ for (i = 0; i < num_elems; i++)
+ {
+ Const *elem_expr;
+
+ /* Only consider non-null values. */
+ if (elem_nulls[i])
+ continue;
+
+ elem_expr = makeConst(ARR_ELEMTYPE(arrval), -1,
+ arr->constcollid, elemlen,
+ elem_values[i], false, elembyval);
+ elem_exprs = lappend(elem_exprs, elem_expr);
+ }
+ }
+ else
+ {
+ ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
+
+ /*
+ * For a nested ArrayExpr, we don't know how to get the actual
+ * scalar values out into a flat list, so we give up doing
+ * anything with this ScalarArrayOpExpr.
+ */
+ if (arrexpr->multidims)
+ return PARTCLAUSE_UNSUPPORTED;
+
+ elem_exprs = arrexpr->elements;
+ }
+
+ /*
+ * Now generate a list of clauses, one for each array element, of the
+ * form saop_leftop saop_op elem_expr
+ */
+ elem_clauses = NIL;
+ foreach(lc1, elem_exprs)
+ {
+ Expr *rightop = (Expr *) lfirst(lc1),
+ *elem_clause;
+
+ elem_clause = make_opclause(saop_op, BOOLOID, false,
+ leftop, rightop,
+ InvalidOid, saop_coll);
+ elem_clauses = lappend(elem_clauses, elem_clause);
+ }
+
+ /*
+ * Build a combine step as if for an OR clause or add the clauses to
+ * the end of the list that's being processed currently.
+ */
+ if (saop->useOr && list_length(elem_clauses) > 1)
+ {
+ Expr *orexpr;
+ bool constfalse = false;
+
+ orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
+ *clause_steps =
+ gen_partprune_steps_internal(rel, context, list_make1(orexpr),
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+
+ Assert(list_length(*clause_steps) == 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ else
+ {
+ bool constfalse = false;
+
+ *clause_steps =
+ gen_partprune_steps_internal(rel, context, elem_clauses,
+ &constfalse);
+ if (constfalse)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ Assert(list_length(*clause_steps) >= 1);
+ return PARTCLAUSE_MATCH_STEPS;
+ }
+ }
+ else if (IsA(clause, NullTest))
+ {
+ NullTest *nulltest = (NullTest *) clause;
+ Expr *arg = nulltest->arg;
+
+ if (IsA(arg, RelabelType))
+ arg = ((RelabelType *) arg)->arg;
+
+ /* Does arg match with this partition key column? */
+ if (!equal(arg, partkey))
+ return PARTCLAUSE_NOMATCH;
+
+ *clause_is_not_null = nulltest->nulltesttype == IS_NOT_NULL;
+
+ return PARTCLAUSE_MATCH_NULLNESS;
+ }
+
+ return PARTCLAUSE_UNSUPPORTED;
+}
+
+/*
+ * get_steps_using_prefix
+ * Generate list of PartitionPruneStepOp steps each consisting of given
+ * opstrategy
+ *
+ * To generate steps, step_lastexpr and step_lastcmpfn are appended to
+ * expressions and cmpfns, respectively, extracted from the clauses in
+ * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the
+ * same partition key column, we must generate steps for various combinations
+ * of the clauses of different keys.
+ */
+static List *
+get_steps_using_prefix(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ List *prefix)
+{
+ /* Quick exit if there are no values to prefix with. */
+ if (list_length(prefix) == 0)
+ {
+ PartitionPruneStep *step;
+
+ step = gen_prune_step_op(context,
+ step_opstrategy,
+ step_op_is_ne,
+ list_make1(step_lastexpr),
+ list_make1_oid(step_lastcmpfn),
+ step_nullkeys);
+ return list_make1(step);
+ }
+
+ /* Recurse to generate steps for various combinations. */
+ return get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_op_is_ne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ list_head(prefix),
+ NIL, NIL);
+}
+
+/*
+ * get_steps_using_prefix_recurse
+ * Recursively generate combinations of clauses for different partition
+ * keys and start generating steps upon reaching clauses for the greatest
+ * column that is less than the one for which we're currently generating
+ * steps (that is, step_lastkeyno)
+ *
+ * 'start' is where we should start iterating for the current invocation.
+ * 'step_exprs' and 'step_cmpfns' each contains the expressions and cmpfns
+ * we've generated so far from the clauses for the previous part keys.
+ */
+static List *
+get_steps_using_prefix_recurse(GeneratePruningStepsContext *context,
+ int step_opstrategy,
+ bool step_op_is_ne,
+ Expr *step_lastexpr,
+ Oid step_lastcmpfn,
+ int step_lastkeyno,
+ Bitmapset *step_nullkeys,
+ ListCell *start,
+ List *step_exprs,
+ List *step_cmpfns)
+{
+ List *result = NIL;
+ ListCell *lc;
+ int cur_keyno;
+
+ /* Actually, recursion would be limited by PARTITION_MAX_KEYS. */
+ check_stack_depth();
+
+ /* Check if we need to recurse. */
+ Assert(start != NULL);
+ cur_keyno = ((PartClauseInfo *) lfirst(start))->keyno;
+ if (cur_keyno < step_lastkeyno - 1)
+ {
+ PartClauseInfo *pc;
+ ListCell *next_start;
+
+ /*
+ * For each clause with cur_keyno, adds its expr and cmpfn to
+ * step_exprs and step_cmpfns, respectively, and recurse after setting
+ * next_start to the ListCell of the first clause for the next
+ * partition key.
+ */
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+
+ if (pc->keyno > cur_keyno)
+ break;
+ }
+ next_start = lc;
+
+ for_each_cell(lc, start)
+ {
+ pc = lfirst(lc);
+ if (pc->keyno == cur_keyno)
+ {
+ /* clean up before starting a new recursion cycle. */
+ if (cur_keyno == 0)
+ {
+ list_free(step_exprs);
+ list_free(step_cmpfns);
+ step_exprs = list_make1(pc->expr);
+ step_cmpfns = list_make1_oid(pc->cmpfn);
+ }
+ else
+ {
+ step_exprs = lappend(step_exprs, pc->expr);
+ step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn);
+ }
+ }
+ else
+ {
+ Assert(pc->keyno > cur_keyno);
+ break;
+ }
+
+ result =
+ list_concat(result,
+ get_steps_using_prefix_recurse(context,
+ step_opstrategy,
+ step_op_is_ne,
+ step_lastexpr,
+ step_lastcmpfn,
+ step_lastkeyno,
+ step_nullkeys,
+ next_start,
+ step_exprs,
+ step_cmpfns));
+ }
+ }
+ else
+ {
+ /*
+ * End the current recursion cycle and start generating steps, one for
+ * each clause with cur_keyno, which is all clauses from here onward
+ * till the end of the list.
+ */
+ Assert(list_length(step_exprs) == cur_keyno);
+ for_each_cell(lc, start)
+ {
+ PartClauseInfo *pc = lfirst(lc);
+ PartitionPruneStep *step;
+ List *step_exprs1,
+ *step_cmpfns1;
+
+ Assert(pc->keyno == cur_keyno);
+
+ /* Leave the original step_exprs unmodified. */
+ step_exprs1 = list_copy(step_exprs);
+ step_exprs1 = lappend(step_exprs1, pc->expr);
+ step_exprs1 = lappend(step_exprs1, step_lastexpr);
+
+ /* Leave the original step_cmpfns unmodified. */
+ step_cmpfns1 = list_copy(step_cmpfns);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn);
+ step_cmpfns1 = lappend_oid(step_cmpfns1, step_lastcmpfn);
+
+ step = gen_prune_step_op(context,
+ step_opstrategy, step_op_is_ne,
+ step_exprs1, step_cmpfns1,
+ step_nullkeys);
+ result = lappend(result, step);
+ }
+ }
+
+ return result;
+}
+
+/*
+ * get_matching_hash_bounds
+ * Determine offset of the hash bound matching the specified values,
+ * considering that all the non-null values come from clauses containing
+ * a compatible hash equality operator and any keys that are null come
+ * from an IS NULL clause.
+ *
+ * Generally this function will return a single matching bound offset,
+ * although if a partition has not been setup for a given modulus then we may
+ * return no matches. If the number of clauses found don't cover the entire
+ * partition key, then we'll need to return all offsets.
+ *
+ * 'opstrategy' if non-zero must be HTEqualStrategyNumber.
+ *
+ * 'values' contains Datums indexed by the partition key to use for pruning.
+ *
+ * 'nvalues', the number of Datums in the 'values' array.
+ *
+ * 'partsupfunc' contains partition hashing functions that can produce correct
+ * hash for the type of the values contained in 'values'.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_hash_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int *partindices = boundinfo->indexes;
+ int partnatts = context->partnatts;
+ bool isnull[PARTITION_MAX_KEYS];
+ int i;
+ uint64 rowHash;
+ int greatest_modulus;
+
+ Assert(context->strategy == PARTITION_STRATEGY_HASH);
+
+ /*
+ * For hash partitioning we can only perform pruning based on equality
+ * clauses to the partition key or IS NULL clauses. We also can only
+ * prune if we got values for all keys.
+ */
+ if (nvalues + bms_num_members(nullkeys) == partnatts)
+ {
+ /*
+ * If there are any values, they must have come from clauses
+ * containing an equality operator compatible with hash partitioning.
+ */
+ Assert(opstrategy == HTEqualStrategyNumber || nvalues == 0);
+
+ for (i = 0; i < partnatts; i++)
+ isnull[i] = bms_is_member(i, nullkeys);
+
+ greatest_modulus = get_hash_partition_greatest_modulus(boundinfo);
+ rowHash = compute_hash_value(partnatts, partsupfunc, values, isnull);
+
+ if (partindices[rowHash % greatest_modulus] >= 0)
+ result->bound_offsets =
+ bms_make_singleton(rowHash % greatest_modulus);
+ }
+ else
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+
+ /*
+ * There is neither a special hash null partition or the default hash
+ * partition.
+ */
+ result->scan_null = result->scan_default = false;
+
+ return result;
+}
+
+/*
+ * get_matching_list_bounds
+ * Determine the offsets of list bounds matching the specified value,
+ * according to the semantics of the given operator strategy
+ * 'opstrategy' if non-zero must be a btree strategy number.
+ *
+ * 'value' contains the value to use for pruning.
+ *
+ * 'nvalues', if non-zero, should be exactly 1, because of list partitioning.
+ *
+ * 'partsupfunc' contains the list partitioning comparison function to be used
+ * to perform partition_list_bsearch
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_list_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum value, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ int off,
+ minoff,
+ maxoff;
+ bool is_equal;
+ bool inclusive = false;
+ Oid *partcollation = context->partcollation;
+
+ Assert(context->strategy == PARTITION_STRATEGY_LIST);
+ Assert(context->partnatts == 1);
+
+ result->scan_null = result->scan_default = false;
+
+ if (!bms_is_empty(nullkeys))
+ {
+ /*
+ * Nulls may exist in only one partition - the partition whose
+ * accepted set of values includes null or the default partition if
+ * the former doesn't exist.
+ */
+ if (partition_bound_accepts_nulls(boundinfo))
+ result->scan_null = true;
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /*
+ * If there are no datums to compare keys with, but there are partitions,
+ * just return the default partition if one exists.
+ */
+ if (boundinfo->ndatums == 0)
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums - 1;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default if any.
+ */
+ if (nvalues == 0)
+ {
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ /* Special case handling of values coming from a <> operator clause. */
+ if (opstrategy == InvalidStrategy)
+ {
+ /*
+ * First match to all bounds. We'll remove any matching datums below.
+ */
+ result->bound_offsets = bms_add_range(NULL, 0,
+ boundinfo->ndatums - 1);
+
+ off = partition_list_bsearch(partsupfunc, partcollation, boundinfo,
+ value, &is_equal);
+ if (off >= 0 && is_equal)
+ {
+
+ /* We have a match. Remove from the result. */
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_del_member(result->bound_offsets,
+ off);
+ }
+
+ /* Always include the default partition if any. */
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ return result;
+ }
+
+ /*
+ * With range queries, always include the default list partition, because
+ * list partitions divide the key space in a discontinuous manner, not all
+ * values in the given range will have a partition assigned. This may not
+ * technically be true for some data types (e.g. integer types), however,
+ * we currently lack any sort of infrastructure to provide us with proofs
+ * that would allow us to do anything smarter here.
+ */
+ if (opstrategy != BTEqualStrategyNumber)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal)
+ {
+ Assert(boundinfo->indexes[off] >= 0);
+ result->bound_offsets = bms_make_singleton(off);
+ }
+ else
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0)
+ {
+ /* We don't want the matched datum to be in the result. */
+ if (!is_equal || !inclusive)
+ off++;
+ }
+ else
+ {
+ /*
+ * This case means all partition bounds are greater, which in
+ * turn means that all partitions satisfy this key.
+ */
+ off = 0;
+ }
+
+ /*
+ * off is greater than the numbers of datums we have partitions
+ * for. The only possible partition that could contain a match is
+ * the default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off > boundinfo->ndatums - 1)
+ return result;
+
+ minoff = off;
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+ off = partition_list_bsearch(partsupfunc,
+ partcollation,
+ boundinfo, value,
+ &is_equal);
+ if (off >= 0 && is_equal && !inclusive)
+ off--;
+
+ /*
+ * off is smaller than the datums of all non-default partitions.
+ * The only possible partition that could contain a match is the
+ * default partition, but we must've set context->scan_default
+ * above anyway if one exists.
+ */
+ if (off < 0)
+ return result;
+
+ maxoff = off;
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ return result;
+}
+
+
+/*
+ * get_matching_range_datums
+ * Determine the offsets of range bounds matching the specified values,
+ * according to the semantics of the given operator strategy
+ *
+ * Each datum whose offset is in result is to be treated as the upper bound of
+ * the partition that will contain the desired values.
+ *
+ * If default partition needs to be scanned for given values, set scan_default
+ * in result if present.
+ *
+ * 'opstrategy' if non-zero must be a btree strategy number.
+ *
+ * 'values' contains Datums indexed by the partition key to use for pruning.
+ *
+ * 'nvalues', number of Datums in 'values' array. Must be <= context->partnatts.
+ *
+ * 'partsupfunc' contains the range partitioning comparison functions to be
+ * used to perform partition_range_datum_bsearch or partition_rbound_datum_cmp
+ * using.
+ *
+ * 'nullkeys' is the set of partition keys that are null.
+ */
+static PruneStepResult *
+get_matching_range_bounds(PartitionPruneContext *context,
+ int opstrategy, Datum *values, int nvalues,
+ FmgrInfo *partsupfunc, Bitmapset *nullkeys)
+{
+ PruneStepResult *result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ PartitionBoundInfo boundinfo = context->boundinfo;
+ Oid *partcollation = context->partcollation;
+ int partnatts = context->partnatts;
+ int *partindices = boundinfo->indexes;
+ int off,
+ minoff,
+ maxoff,
+ i;
+ bool is_equal;
+ bool inclusive = false;
+
+ Assert(context->strategy == PARTITION_STRATEGY_RANGE);
+ Assert(nvalues <= partnatts);
+
+ result->scan_null = result->scan_default = false;
+
+ /*
+ * If there are no datums to compare keys with, or if we got an IS NULL
+ * clause just return the default partition, if it exists.
+ */
+ if (boundinfo->ndatums == 0 || !bms_is_empty(nullkeys))
+ {
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+
+ minoff = 0;
+ maxoff = boundinfo->ndatums;
+
+ /*
+ * If there are no values to compare with the datums in boundinfo, it
+ * means the caller asked for partitions for all non-null datums. Add
+ * indexes of *all* partitions, including the default partition if one
+ * exists.
+ */
+ if (nvalues == 0)
+ {
+ if (partindices[minoff] < 0)
+ minoff++;
+ if (partindices[maxoff] < 0)
+ maxoff--;
+
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+ }
+
+ /*
+ * If the query does not constrain all key columns, we'll need to scan the
+ * the default partition, if any.
+ */
+ if (nvalues < partnatts)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ switch (opstrategy)
+ {
+ case BTEqualStrategyNumber:
+ /* Look for the smallest bound that is = lookup value. */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+
+ if (off >= 0 && is_equal)
+ {
+ if (nvalues == partnatts)
+ {
+ /* There can only be zero or one matching partition. */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ int saved_off = off;
+
+ /*
+ * Since the lookup value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ */
+
+ /*
+ * First find greatest bound that's smaller than the
+ * lookup value.
+ */
+ while (off >= 1)
+ {
+ int32 cmpval;
+
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off - 1],
+ boundinfo->kind[off - 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off--;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * We can treat 'off' as the offset of the smallest bound
+ * to be included in the result, if we know it is the
+ * upper bound of the partition in which the lookup value
+ * could possibly exist. One case it couldn't is if the
+ * bound, or precisely the matched portion of its prefix,
+ * is not inclusive.
+ */
+ if (boundinfo->kind[off][nvalues] ==
+ PARTITION_RANGE_DATUM_MINVALUE)
+ off++;
+
+ minoff = off;
+
+ /*
+ * Now find smallest bound that's greater than the lookup
+ * value.
+ */
+ off = saved_off;
+ while (off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off + 1],
+ boundinfo->kind[off + 1],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+ off++;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ /*
+ * off + 1, then would be the offset of the greatest bound
+ * to be included in the result.
+ */
+ maxoff = off + 1;
+ }
+
+ /*
+ * Skip if minoff/maxoff are actually the upper bound of a
+ * un-assigned portion of values.
+ */
+ if (partindices[minoff] < 0 && minoff < boundinfo->ndatums)
+ minoff++;
+ if (partindices[maxoff] < 0 && maxoff >= 1)
+ maxoff--;
+
+ /*
+ * There may exist a range of values unassigned to any
+ * non-default partition between the datums at minoff and
+ * maxoff. Add the default partition in that case.
+ */
+ if (partition_bound_has_default(boundinfo))
+ {
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+ }
+ else if (off >= 0) /* !is_equal */
+ {
+ /*
+ * The lookup value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest bound
+ * that is <= lookup value, so add off + 1 to the result
+ * instead as the offset of the upper bound of the only
+ * partition that may contain the lookup value.
+ */
+ if (partindices[off + 1] >= 0)
+ result->bound_offsets = bms_make_singleton(off + 1);
+ else
+ result->scan_default =
+ partition_bound_has_default(boundinfo);
+ }
+ else
+ {
+ /*
+ * off < 0: the lookup value is smaller than all bounds, so
+ * only the default partition qualifies, if there is one.
+ */
+ result->scan_default = partition_bound_has_default(boundinfo);
+ }
+
+ return result;
+
+ case BTGreaterEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTGreaterStrategyNumber:
+
+ /*
+ * Look for the smallest bound that is > or >= lookup value and
+ * set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the lookup value, so include
+ * all of them in the result.
+ */
+ minoff = 0;
+ }
+ else
+ {
+ if (is_equal && nvalues < partnatts)
+ {
+ /*
+ * Since the lookup value contains only a prefix of keys,
+ * we must find other bounds that may also match the
+ * prefix. partition_range_datum_bsearch() returns the
+ * offset of one of them, find others by checking adjacent
+ * bounds.
+ *
+ * Based on whether the lookup values are inclusive or
+ * not, we must either include the indexes of all such
+ * bounds in the result (that is, set minoff to the index
+ * of smallest such bound) or find the smallest one that's
+ * greater than the lookup values and set minoff to that.
+ */
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off - 1 : off + 1;
+ cmpval =
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ minoff = inclusive ? off : off + 1;
+ }
+
+ /*
+ * lookup value falls in the range between some bounds in
+ * boundinfo. off would be the offset of the greatest bound
+ * that is <= lookup value, so add off + 1 to the result
+ * instead as the offset of the upper bound of the smallest
+ * partition that may contain the lookup value.
+ */
+ else
+ minoff = off + 1;
+ }
+ break;
+
+ case BTLessEqualStrategyNumber:
+ inclusive = true;
+ /* fall through */
+ case BTLessStrategyNumber:
+
+ /*
+ * Look for the greatest bound that is < or <= lookup value and
+ * set minoff to its offset.
+ */
+ off = partition_range_datum_bsearch(partsupfunc,
+ partcollation,
+ boundinfo,
+ nvalues, values,
+ &is_equal);
+ if (off < 0)
+ {
+ /*
+ * All bounds are greater than the key, so we could only
+ * expect to find the lookup key in the default partition.
+ */
+ result->scan_default = partition_bound_has_default(boundinfo);
+ return result;
+ }
+ else
+ {
+ /*
+ * See the comment above.
+ */
+ if (is_equal && nvalues < partnatts)
+ {
+ while (off >= 1 && off < boundinfo->ndatums - 1)
+ {
+ int32 cmpval;
+ int nextoff;
+
+ nextoff = inclusive ? off + 1 : off - 1;
+ cmpval = partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[nextoff],
+ boundinfo->kind[nextoff],
+ values, nvalues);
+ if (cmpval != 0)
+ break;
+
+ off = nextoff;
+ }
+
+ Assert(0 ==
+ partition_rbound_datum_cmp(partsupfunc,
+ partcollation,
+ boundinfo->datums[off],
+ boundinfo->kind[off],
+ values, nvalues));
+
+ maxoff = inclusive ? off + 1 : off;
+ }
+
+ /*
+ * The lookup value falls in the range between some bounds in
+ * boundinfo. 'off' would be the offset of the greatest bound
+ * that is <= lookup value, so add off + 1 to the result
+ * instead as the offset of the upper bound of the greatest
+ * partition that may contain lookup value. If the lookup
+ * value had exactly matched the bound, but it isn't
+ * inclusive, no need add the adjacent partition.
+ */
+ else if (!is_equal || inclusive)
+ maxoff = off + 1;
+ else
+ maxoff = off;
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid strategy number %d", opstrategy);
+ break;
+ }
+
+ /*
+ * Skip a gap and when doing so, check if the bound contains a finite
+ * value to decide if we need to add the default partition. If it's an
+ * infinite bound, we need not add the default partition, as having an
+ * infinite bound means the partition in question catches any values that
+ * would otherwise be in the default partition.
+ */
+ if (partindices[minoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (minoff >= 0 &&
+ minoff < boundinfo->ndatums &&
+ boundinfo->kind[minoff][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ minoff++;
+ }
+
+ /*
+ * Skip a gap. See the above comment about how we decide whether or or
+ * not to scan the default partition based whether the datum that will
+ * become the maximum datum is finite or not.
+ */
+ if (maxoff >= 1 && partindices[maxoff] < 0)
+ {
+ int lastkey = nvalues - 1;
+
+ if (maxoff >= 0 &&
+ maxoff <= boundinfo->ndatums &&
+ boundinfo->kind[maxoff - 1][lastkey] ==
+ PARTITION_RANGE_DATUM_VALUE)
+ result->scan_default = partition_bound_has_default(boundinfo);
+
+ maxoff--;
+ }
+
+ if (partition_bound_has_default(boundinfo))
+ {
+ /*
+ * There may exist a range of values unassigned to any non-default
+ * partition between the datums at minoff and maxoff. Add the default
+ * partition in that case.
+ */
+ for (i = minoff; i <= maxoff; i++)
+ {
+ if (partindices[i] < 0)
+ {
+ result->scan_default = true;
+ break;
+ }
+ }
+ }
+
+ Assert(minoff >= 0 && maxoff >= 0);
+ if (minoff <= maxoff)
+ result->bound_offsets = bms_add_range(NULL, minoff, maxoff);
+
+ return result;
+}
+
+/*
+ * perform_pruning_base_step
+ * Determines the indexes of datums that satisfy conditions specified in
+ * 'opstep'.
+ *
+ * Result also contains whether special null-accepting and/or default
+ * partition need to be scanned.
+ */
+static PruneStepResult *
+perform_pruning_base_step(PartitionPruneContext *context,
+ PartitionPruneStepOp *opstep)
+{
+ ListCell *lc1,
+ *lc2;
+ int keyno,
+ nvalues;
+ Datum values[PARTITION_MAX_KEYS];
+ FmgrInfo partsupfunc[PARTITION_MAX_KEYS];
+
+ /*
+ * There better be the same number of expressions and compare functions.
+ */
+ Assert(list_length(opstep->exprs) == list_length(opstep->cmpfns));
+
+ nvalues = 0;
+ lc1 = list_head(opstep->exprs);
+ lc2 = list_head(opstep->cmpfns);
+
+ /*
+ * Generate the partition lookup key that will be used by one of the
+ * get_matching_*_bounds functions called below.
+ */
+ for (keyno = 0; keyno < context->partnatts; keyno++)
+ {
+ /*
+ * For hash partitioning, it is possible that values of some keys are
+ * not provided in operator clauses, but instead the planner found
+ * that they appeared in a IS NULL clause.
+ */
+ if (bms_is_member(keyno, opstep->nullkeys))
+ continue;
+
+ /*
+ * For range partitioning, we must only perform pruning with values
+ * for either all partition keys or a prefix thereof.
+ */
+ if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
+ break;
+
+ if (lc1 != NULL)
+ {
+ Expr *expr;
+ Datum datum;
+
+ expr = lfirst(lc1);
+ if (partkey_datum_from_expr(context, expr, &datum))
+ {
+ Oid cmpfn;
+
+ /*
+ * If we're going to need a different comparison function than
+ * the one cached in the PartitionKey, we'll need to look up
+ * the FmgrInfo.
+ */
+ cmpfn = lfirst_oid(lc2);
+ Assert(OidIsValid(cmpfn));
+ if (cmpfn != context->partsupfunc[keyno].fn_oid)
+ fmgr_info(cmpfn, &partsupfunc[keyno]);
+ else
+ fmgr_info_copy(&partsupfunc[keyno],
+ &context->partsupfunc[keyno],
+ CurrentMemoryContext);
+
+ values[keyno] = datum;
+ nvalues++;
+ }
+
+ lc1 = lnext(lc1);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ switch (context->strategy)
+ {
+ case PARTITION_STRATEGY_HASH:
+ return get_matching_hash_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_LIST:
+ return get_matching_list_bounds(context,
+ opstep->opstrategy,
+ values[0], nvalues,
+ &partsupfunc[0],
+ opstep->nullkeys);
+
+ case PARTITION_STRATEGY_RANGE:
+ return get_matching_range_bounds(context,
+ opstep->opstrategy,
+ values, nvalues,
+ partsupfunc,
+ opstep->nullkeys);
+
+ default:
+ elog(ERROR, "unexpected partition strategy: %d",
+ (int) context->strategy);
+ break;
+ }
+
+ return NULL;
+}
+
+/*
+ * perform_pruning_combine_step
+ * Determines the indexes of datums obtained by combining those given
+ * by the steps identified by cstep->source_stepids using the specified
+ * combination method
+ *
+ * Since cstep may refer to the result of earlier steps, we also receive
+ * step_results here.
+ */
+static PruneStepResult *
+perform_pruning_combine_step(PartitionPruneContext *context,
+ PartitionPruneStepCombine *cstep,
+ PruneStepResult **step_results)
+{
+ ListCell *lc1;
+ PruneStepResult *result = NULL;
+ bool firststep;
+
+ /*
+ * A combine step without any source steps is an indication to not perform
+ * any partition pruning, we just return all partitions.
+ */
+ result = (PruneStepResult *) palloc0(sizeof(PruneStepResult));
+ if (list_length(cstep->source_stepids) == 0)
+ {
+ PartitionBoundInfo boundinfo = context->boundinfo;
+
+ result->bound_offsets = bms_add_range(NULL, 0, boundinfo->ndatums - 1);
+ result->scan_default = partition_bound_has_default(boundinfo);
+ result->scan_null = partition_bound_accepts_nulls(boundinfo);
+ return result;
+ }
+
+ switch (cstep->combineOp)
+ {
+ case PARTPRUNE_COMBINE_UNION:
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ /*
+ * step_results[step_id] must contain a valid result, which is
+ * confirmed by the fact that cstep's step_id is greater than
+ * step_id and the fact that results of the individual steps
+ * are evaluated in sequence of their step_ids.
+ */
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ /* Record any additional datum indexes from this step */
+ result->bound_offsets = bms_add_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (!result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (!result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ break;
+
+ case PARTPRUNE_COMBINE_INTERSECT:
+ firststep = true;
+ foreach(lc1, cstep->source_stepids)
+ {
+ int step_id = lfirst_int(lc1);
+ PruneStepResult *step_result;
+
+ if (step_id >= cstep->step.step_id)
+ elog(ERROR, "invalid pruning combine step argument");
+ step_result = step_results[step_id];
+ Assert(step_result != NULL);
+
+ if (firststep)
+ {
+ /* Copy step's result the first time. */
+ result->bound_offsets = step_result->bound_offsets;
+ result->scan_null = step_result->scan_null;
+ result->scan_default = step_result->scan_default;
+ firststep = false;
+ }
+ else
+ {
+ /* Record datum indexes common to both steps */
+ result->bound_offsets =
+ bms_int_members(result->bound_offsets,
+ step_result->bound_offsets);
+
+ /* Update whether to scan null and default partitions. */
+ if (result->scan_null)
+ result->scan_null = step_result->scan_null;
+ if (result->scan_default)
+ result->scan_default = step_result->scan_default;
+ }
+ }
+ break;
+
+ default:
+ elog(ERROR, "invalid pruning combine op: %d",
+ (int) cstep->combineOp);
+ }
+
+ return result;
+}
+
+/*
+ * match_boolean_partition_clause
+ *
+ * Sets *outconst to a Const containing true or false value and returns true if
+ * we're able to match the clause to the partition key as specially-shaped
+ * Boolean clause. Returns false otherwise with *outconst set to NULL.
+ */
+static bool
+match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
+ Expr **outconst)
+{
+ Expr *leftop;
+
+ *outconst = NULL;
+
+ if (!IsBooleanOpfamily(partopfamily))
+ return false;
+
+ if (IsA(clause, BooleanTest))
+ {
+ BooleanTest *btest = (BooleanTest *) clause;
+
+ /* Only IS [NOT] TRUE/FALSE are any good to us */
+ if (btest->booltesttype == IS_UNKNOWN ||
+ btest->booltesttype == IS_NOT_UNKNOWN)
+ return false;
+
+ leftop = btest->arg;
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ if (equal(leftop, partkey))
+ *outconst = (btest->booltesttype == IS_TRUE ||
+ btest->booltesttype == IS_NOT_FALSE)
+ ? (Expr *) makeBoolConst(true, false)
+ : (Expr *) makeBoolConst(false, false);
+
+ if (*outconst)
+ return true;
+ }
+ else
+ {
+ bool is_not_clause = not_clause((Node *) clause);
+
+ leftop = is_not_clause ? get_notclausearg(clause) : clause;
+
+ if (IsA(leftop, RelabelType))
+ leftop = ((RelabelType *) leftop)->arg;
+
+ /* Compare to the partition key, and make up a clause ... */
+ if (equal(leftop, partkey))
+ *outconst = is_not_clause ?
+ (Expr *) makeBoolConst(false, false) :
+ (Expr *) makeBoolConst(true, false);
+ else if (equal(negate_clause((Node *) leftop), partkey))
+ *outconst = (Expr *) makeBoolConst(false, false);
+
+ if (*outconst)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * partkey_datum_from_expr
+ * Evaluate 'expr', set *value to the resulting Datum. Return true if
+ * evaluation was possible, otherwise false.
+ */
+static bool
+partkey_datum_from_expr(PartitionPruneContext *context,
+ Expr *expr, Datum *value)
+{
+ switch (nodeTag(expr))
+ {
+ case T_Const:
+ *value = ((Const *) expr)->constvalue;
+ return true;
+
+ default:
+ break;
+ }
+
+ return false;
+}
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
index cd15faa7a1..b25e25bf9d 100644
--- a/src/include/catalog/partition.h
+++ b/src/include/catalog/partition.h
@@ -26,7 +26,7 @@
* PartitionBoundInfo encapsulates a set of partition bounds. It is usually
* associated with partitioned tables as part of its partition descriptor.
*
- * The internal structure is opaque outside partition.c.
+ * The internal structure appears in partbounds.h.
*/
typedef struct PartitionBoundInfoData *PartitionBoundInfo;
@@ -70,7 +70,6 @@ extern void check_default_allows_bound(Relation parent, Relation defaultRel,
PartitionBoundSpec *new_spec);
extern List *get_proposed_default_constraint(List *new_part_constaints);
-/* For tuple routing */
extern int get_partition_for_tuple(Relation relation, Datum *values,
bool *isnull);
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index b544474254..5b20dd77a1 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -53,6 +53,9 @@ typedef FormData_pg_opfamily *Form_pg_opfamily;
#define Anum_pg_opfamily_opfnamespace 3
#define Anum_pg_opfamily_opfowner 4
+#define IsBooleanOpfamily(opfamily) \
+ ((opfamily) == BOOL_BTREE_FAM_OID || (opfamily) == BOOL_HASH_FAM_OID)
+
/* ----------------
* initial contents of pg_opfamily
* ----------------
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index b1e3d53f78..4fc2de7184 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -193,6 +193,9 @@ typedef enum NodeTag
T_FromExpr,
T_OnConflictExpr,
T_IntoClause,
+ T_PartitionPruneStep,
+ T_PartitionPruneStepOp,
+ T_PartitionPruneStepCombine,
/*
* TAGS FOR EXPRESSION STATE NODES (execnodes.h)
@@ -262,7 +265,6 @@ typedef enum NodeTag
T_PlaceHolderVar,
T_SpecialJoinInfo,
T_AppendRelInfo,
- T_PartitionedChildRelInfo,
T_PlaceHolderInfo,
T_MinMaxAggInfo,
T_PlannerParamItem,
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 1b4b0d75af..7c4540b261 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1506,4 +1506,78 @@ typedef struct OnConflictExpr
List *exclRelTlist; /* tlist of the EXCLUDED pseudo relation */
} OnConflictExpr;
+
+/*
+ * Node types to represent a partition pruning step.
+ */
+
+/*
+ * The base Node type. step_id is the global identifier of a given step
+ * within a given pruning context.
+ */
+typedef struct PartitionPruneStep
+{
+ NodeTag type;
+ int step_id;
+} PartitionPruneStep;
+
+/*----------
+ * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
+ * OpExpr clauses
+ *
+ * This contains information extracted from up to partnatts OpExpr clauses,
+ * where partnatts is the number of partition key columns. 'opstrategy' is the
+ * strategy of the operator in the clause matched to the last partition key.
+ * 'exprs' contains expressions which comprise the lookup key to be passed to
+ * the partition bound search function. 'cmpfns' contains the OIDs of
+ * comparison function used to compare aforementioned expressions with
+ * partition bounds. Both 'exprs' and 'cmpfns' contain the same number of
+ * items up to partnatts items.
+ *
+ * Once we find the offset of a partition bound using the lookup key, we
+ * determine which partitions to include in the result based on the value of
+ * 'opstrategy'. For example, if it were equality, we'd return just the
+ * partition that would contain that key or a set of partitions if the key
+ * didn't consist of all partitioning columns. For non-equality strategies,
+ * we'd need to include other partitions as appropriate.
+ *
+ * 'nullkeys' is the set containing the offset of the partition keys (0 to
+ * partnatts - 1) that were matched to an IS NULL clause. This is only
+ * considered for hash partitioning as we need to pass which keys are null
+ * to the hash partition bound search function. It is never possible to
+ * have an expression be present in 'exprs' for a given partition key and
+ * the corresponding bit set in 'nullkeys'.
+ *----------
+ */
+typedef struct PartitionPruneStepOp
+{
+ PartitionPruneStep step;
+
+ int opstrategy;
+ List *exprs;
+ List *cmpfns;
+ Bitmapset *nullkeys;
+} PartitionPruneStepOp;
+
+/*----------
+ * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
+ *
+ * For BoolExpr clauses, we combine the set of partitions determined for each
+ * of its argument clauses.
+ *----------
+ */
+typedef enum PartitionPruneCombineOp
+{
+ PARTPRUNE_COMBINE_UNION,
+ PARTPRUNE_COMBINE_INTERSECT
+} PartitionPruneCombineOp;
+
+typedef struct PartitionPruneStepCombine
+{
+ PartitionPruneStep step;
+
+ PartitionPruneCombineOp combineOp;
+ List *source_stepids;
+} PartitionPruneStepCombine;
+
#endif /* PRIMNODES_H */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a2dde70de5..acb8814924 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -15,6 +15,7 @@
#define RELATION_H
#include "access/sdir.h"
+#include "fmgr.h"
#include "lib/stringinfo.h"
#include "nodes/params.h"
#include "nodes/parsenodes.h"
@@ -253,8 +254,6 @@ typedef struct PlannerInfo
List *append_rel_list; /* list of AppendRelInfos */
- List *pcinfo_list; /* list of PartitionedChildRelInfos */
-
List *rowMarks; /* list of PlanRowMarks */
List *placeholder_list; /* list of PlaceHolderInfos */
@@ -319,6 +318,9 @@ typedef struct PlannerInfo
/* optional private data for join_search_hook, e.g., GEQO */
void *join_search_private;
+
+ /* Does this query modify any partition key columns? */
+ bool partColsUpdated;
} PlannerInfo;
@@ -356,6 +358,9 @@ typedef struct PartitionSchemeData
/* Cached information about partition key data types. */
int16 *parttyplen;
bool *parttypbyval;
+
+ /* Cached information about partition comparison functions. */
+ FmgrInfo *partsupfunc;
} PartitionSchemeData;
typedef struct PartitionSchemeData *PartitionScheme;
@@ -528,11 +533,15 @@ typedef struct PartitionSchemeData *PartitionScheme;
*
* If the relation is partitioned, these fields will be set:
*
- * part_scheme - Partitioning scheme of the relation
- * boundinfo - Partition bounds
- * nparts - Number of partitions
- * part_rels - RelOptInfos for each partition
- * partexprs, nullable_partexprs - Partition key expressions
+ * part_scheme - Partitioning scheme of the relation
+ * nparts - Number of partitions
+ * boundinfo - Partition bounds
+ * partition_qual - Partition constraint if not the root
+ * part_rels - RelOptInfos for each partition
+ * partexprs, nullable_partexprs - Partition key expressions
+ * partitioned_child_rels - RT indexes of unpruned partitions of
+ * relation that are partitioned tables
+ * themselves
*
* Note: A base relation always has only one set of partition keys, but a join
* relation may have as many sets of partition keys as the number of relations
@@ -663,10 +672,12 @@ typedef struct RelOptInfo
PartitionScheme part_scheme; /* Partitioning scheme. */
int nparts; /* number of partitions */
struct PartitionBoundInfoData *boundinfo; /* Partition bounds */
+ List *partition_qual; /* partition constraint */
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Non-nullable partition key expressions. */
List **nullable_partexprs; /* Nullable partition key expressions. */
+ List *partitioned_child_rels; /* List of RT indexes. */
} RelOptInfo;
/*
@@ -1686,7 +1697,7 @@ typedef struct ModifyTablePath
List *partitioned_rels;
bool partColsUpdated; /* some part key in hierarchy updated */
List *resultRelations; /* integer list of RT indexes */
- Index mergeTargetRelation;/* RT index of merge target relation */
+ Index mergeTargetRelation; /* RT index of merge target relation */
List *subpaths; /* Path(s) producing source data */
List *subroots; /* per-target-table PlannerInfos */
List *withCheckOptionLists; /* per-target-table WCO lists */
@@ -2122,27 +2133,6 @@ typedef struct AppendRelInfo
} AppendRelInfo;
/*
- * For a partitioned table, this maps its RT index to the list of RT indexes
- * of the partitioned child tables in the partition tree. We need to
- * separately store this information, because we do not create AppendRelInfos
- * for the partitioned child tables of a parent table, since AppendRelInfos
- * contain information that is unnecessary for the partitioned child tables.
- * The child_rels list must contain at least one element, because the parent
- * partitioned table is itself counted as a child.
- *
- * These structs are kept in the PlannerInfo node's pcinfo_list.
- */
-typedef struct PartitionedChildRelInfo
-{
- NodeTag type;
-
- Index parent_relid;
- List *child_rels;
- bool part_cols_updated; /* is the partition key of any of
- * the partitioned tables updated? */
-} PartitionedChildRelInfo;
-
-/*
* For each distinct placeholder expression generated during planning, we
* store a PlaceHolderInfo node in the PlannerInfo node's placeholder_list.
* This stores info that is needed centrally rather than in each copy of the
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 07a3bc0627..c090396e13 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,9 +59,4 @@ extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
extern int plan_create_index_workers(Oid tableOid, Oid indexOid);
-extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti,
- bool *part_cols_updated);
-extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
- Relids join_relids);
-
#endif /* PLANNER_H */
diff --git a/src/include/partitioning/partbounds.h b/src/include/partitioning/partbounds.h
new file mode 100644
index 0000000000..c76014d4a8
--- /dev/null
+++ b/src/include/partitioning/partbounds.h
@@ -0,0 +1,124 @@
+/*-------------------------------------------------------------------------
+ *
+ * partbounds.h
+ *
+ * Copyright (c) 2007-2018, PostgreSQL Global Development Group
+ *
+ * src/include/partitioning/partbounds.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTBOUNDS_H
+#define PARTBOUNDS_H
+
+#include "catalog/partition.h"
+
+
+/*
+ * PartitionBoundInfoData encapsulates a set of partition bounds. It is
+ * usually associated with partitioned tables as part of its partition
+ * descriptor, but may also be used to represent a virtual partitioned
+ * table such as a partitioned joinrel within the planner.
+ *
+ * A list partition datum that is known to be NULL is never put into the
+ * datums array. Instead, it is tracked using the null_index field.
+ *
+ * In the case of range partitioning, ndatums will typically be far less than
+ * 2 * nparts, because a partition's upper bound and the next partition's lower
+ * bound are the same in most common cases, and we only store one of them (the
+ * upper bound). In case of hash partitioning, ndatums will be same as the
+ * number of partitions.
+ *
+ * For range and list partitioned tables, datums is an array of datum-tuples
+ * with key->partnatts datums each. For hash partitioned tables, it is an array
+ * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
+ * given partition.
+ *
+ * The datums in datums array are arranged in increasing order as defined by
+ * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
+ * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
+ * respectively. For range and list partitions this simply means that the
+ * datums in the datums array are arranged in increasing order as defined by
+ * the partition key's operator classes and collations.
+ *
+ * In the case of list partitioning, the indexes array stores one entry for
+ * every datum, which is the index of the partition that accepts a given datum.
+ * In case of range partitioning, it stores one entry per distinct range
+ * datum, which is the index of the partition for which a given datum
+ * is an upper bound. In the case of hash partitioning, the number of the
+ * entries in the indexes array is same as the greatest modulus amongst all
+ * partitions. For a given partition key datum-tuple, the index of the
+ * partition which would accept that datum-tuple would be given by the entry
+ * pointed by remainder produced when hash value of the datum-tuple is divided
+ * by the greatest modulus.
+ */
+
+typedef struct PartitionBoundInfoData
+{
+ char strategy; /* hash, list or range? */
+ int ndatums; /* Length of the datums following array */
+ Datum **datums;
+ PartitionRangeDatumKind **kind; /* The kind of each range bound datum;
+ * NULL for hash and list partitioned
+ * tables */
+ int *indexes; /* Partition indexes */
+ int null_index; /* Index of the null-accepting partition; -1
+ * if there isn't one */
+ int default_index; /* Index of the default partition; -1 if there
+ * isn't one */
+} PartitionBoundInfoData;
+
+#define partition_bound_accepts_nulls(bi) ((bi)->null_index != -1)
+#define partition_bound_has_default(bi) ((bi)->default_index != -1)
+
+/*
+ * When qsort'ing partition bounds after reading from the catalog, each bound
+ * is represented with one of the following structs.
+ */
+
+/* One bound of a hash partition */
+typedef struct PartitionHashBound
+{
+ int modulus;
+ int remainder;
+ int index;
+} PartitionHashBound;
+
+/* One value coming from some (index'th) list partition */
+typedef struct PartitionListValue
+{
+ int index;
+ Datum value;
+} PartitionListValue;
+
+/* One bound of a range partition */
+typedef struct PartitionRangeBound
+{
+ int index;
+ Datum *datums; /* range bound datums */
+ PartitionRangeDatumKind *kind; /* the kind of each datum */
+ bool lower; /* this is the lower (vs upper) bound */
+} PartitionRangeBound;
+
+extern int get_hash_partition_greatest_modulus(PartitionBoundInfo b);
+extern int partition_list_bsearch(FmgrInfo *partsupfunc, Oid *partcollation,
+ PartitionBoundInfo boundinfo,
+ Datum value, bool *is_equal);
+extern int partition_range_bsearch(int partnatts, FmgrInfo *partsupfunc,
+ Oid *partcollation,
+ PartitionBoundInfo boundinfo,
+ PartitionRangeBound *probe, bool *is_equal);
+extern int partition_range_datum_bsearch(FmgrInfo *partsupfunc,
+ Oid *partcollation,
+ PartitionBoundInfo boundinfo,
+ int nvalues, Datum *values, bool *is_equal);
+extern int partition_hash_bsearch(PartitionBoundInfo boundinfo,
+ int modulus, int remainder);
+extern uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc,
+ Datum *values, bool *isnull);
+extern int32 partition_rbound_datum_cmp(FmgrInfo *partsupfunc,
+ Oid *partcollation,
+ Datum *rb_datums, PartitionRangeDatumKind *rb_kind,
+ Datum *tuple_datums, int n_tuple_datums);
+
+#endif /* PARTBOUNDS_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
new file mode 100644
index 0000000000..b272ae9a90
--- /dev/null
+++ b/src/include/partitioning/partprune.h
@@ -0,0 +1,49 @@
+/*-------------------------------------------------------------------------
+ *
+ * partprune.h
+ * prototypes for partprune.c
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/partitioning/partprune.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PARTPRUNE_H
+#define PARTPRUNE_H
+
+#include "catalog/partition.h"
+#include "nodes/relation.h"
+
+/*
+ * PartitionPruneContext
+ *
+ * Information about a partitioned table needed to perform partition pruning.
+ */
+typedef struct PartitionPruneContext
+{
+ /* Partition key information */
+ char strategy;
+ int partnatts;
+ Oid *partopfamily;
+ Oid *partopcintype;
+ Oid *partcollation;
+ FmgrInfo *partsupfunc;
+
+ /* Number of partitions */
+ int nparts;
+
+ /* Partition boundary info */
+ PartitionBoundInfo boundinfo;
+} PartitionPruneContext;
+
+
+extern Relids prune_append_rel_partitions(RelOptInfo *rel);
+extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
+ List *pruning_steps);
+extern List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ bool *constfalse);
+
+#endif /* PARTPRUNE_H */
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index 5e57b9a465..b2b912ed5c 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1951,11 +1951,13 @@ explain (costs off) select * from mcrparted where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted_def
Filter: (abs(b) = 5)
-(13 rows)
+(15 rows)
explain (costs off) select * from mcrparted where a > -1; -- scans all partitions
QUERY PLAN
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 09517775b6..2d77b3edd4 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -208,16 +208,14 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
(3 rows)
explain (costs off) select * from rlp where a <= 1;
- QUERY PLAN
----------------------------------------
+ QUERY PLAN
+--------------------------
Append
-> Seq Scan on rlp1
Filter: (a <= 1)
-> Seq Scan on rlp2
Filter: (a <= 1)
- -> Seq Scan on rlp_default_default
- Filter: (a <= 1)
-(7 rows)
+(5 rows)
explain (costs off) select * from rlp where a = 1;
QUERY PLAN
@@ -235,7 +233,7 @@ explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
Filter: (a = '1'::bigint)
(3 rows)
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
QUERY PLAN
-----------------------------------------------
Append
@@ -265,9 +263,11 @@ explain (costs off) select * from rlp where a = 1::numeric; /* only null can be
Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_30
Filter: ((a)::numeric = '1'::numeric)
+ -> Seq Scan on rlp_default_null
+ Filter: ((a)::numeric = '1'::numeric)
-> Seq Scan on rlp_default_default
Filter: ((a)::numeric = '1'::numeric)
-(29 rows)
+(31 rows)
explain (costs off) select * from rlp where a <= 10;
QUERY PLAN
@@ -575,7 +575,9 @@ explain (costs off) select * from rlp where a > 20 and a < 27;
Filter: ((a > 20) AND (a < 27))
-> Seq Scan on rlp4_default
Filter: ((a > 20) AND (a < 27))
-(7 rows)
+ -> Seq Scan on rlp_default_default
+ Filter: ((a > 20) AND (a < 27))
+(9 rows)
explain (costs off) select * from rlp where a = 29;
QUERY PLAN
@@ -714,9 +716,7 @@ explain (costs off) select * from mc3p where a = 1 and abs(b) = 1 and c < 8;
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-> Seq Scan on mc3p1
Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
- -> Seq Scan on mc3p_default
- Filter: ((c < 8) AND (a = 1) AND (abs(b) = 1))
-(7 rows)
+(5 rows)
explain (costs off) select * from mc3p where a = 10 and abs(b) between 5 and 35;
QUERY PLAN
@@ -892,6 +892,8 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p2
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
+ -> Seq Scan on mc3p3
+ Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p4
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p5
@@ -902,7 +904,7 @@ explain (costs off) select * from mc3p where a = 1 or abs(b) = 1 or c = 1;
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-> Seq Scan on mc3p_default
Filter: ((a = 1) OR (abs(b) = 1) OR (c = 1))
-(17 rows)
+(19 rows)
explain (costs off) select * from mc3p where (a = 1 and abs(b) = 1) or (a = 10 and abs(b) = 10);
QUERY PLAN
@@ -1007,24 +1009,20 @@ explain (costs off) select * from boolpart where a in (true, false);
(5 rows)
explain (costs off) select * from boolpart where a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (NOT a)
- -> Seq Scan on boolpart_default
- Filter: (NOT a)
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where not a = false;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+------------------------------
Append
-> Seq Scan on boolpart_t
Filter: a
- -> Seq Scan on boolpart_default
- Filter: a
-(5 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is true or a is not true;
QUERY PLAN
@@ -1034,33 +1032,22 @@ explain (costs off) select * from boolpart where a is true or a is not true;
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-> Seq Scan on boolpart_t
Filter: ((a IS TRUE) OR (a IS NOT TRUE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS TRUE) OR (a IS NOT TRUE))
-(7 rows)
+(5 rows)
explain (costs off) select * from boolpart where a is not true;
- QUERY PLAN
-------------------------------------
+ QUERY PLAN
+---------------------------------
Append
-> Seq Scan on boolpart_f
Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_t
- Filter: (a IS NOT TRUE)
- -> Seq Scan on boolpart_default
- Filter: (a IS NOT TRUE)
-(7 rows)
+(3 rows)
explain (costs off) select * from boolpart where a is not true and a is not false;
- QUERY PLAN
---------------------------------------------------------
- Append
- -> Seq Scan on boolpart_f
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_t
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
- -> Seq Scan on boolpart_default
- Filter: ((a IS NOT TRUE) AND (a IS NOT FALSE))
-(7 rows)
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
explain (costs off) select * from boolpart where a is unknown;
QUERY PLAN
@@ -1086,4 +1073,446 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p2 t2_2
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p3 t2_3
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p4 t2_4
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p5 t2_5
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p6 t2_6
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p7 t2_7
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_8
+ Filter: ((a = t1.b) AND (c = 1) AND (abs(b) = 1))
+(28 rows)
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+ QUERY PLAN
+-----------------------------------------------------------------------
+ Nested Loop
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p0 t2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p1 t2_1
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+ -> Seq Scan on mc3p_default t2_2
+ Filter: ((c = t1.b) AND (a = 1) AND (abs(b) = 1))
+(16 rows)
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+ QUERY PLAN
+--------------------------------------------------------------------
+ Nested Loop
+ -> Aggregate
+ -> Append
+ -> Seq Scan on mc3p1 t2
+ Filter: ((a = 1) AND (c = 1) AND (abs(b) = 1))
+ -> Append
+ -> Seq Scan on mc2p1 t1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p2 t1_1
+ Filter: (a = 1)
+ -> Seq Scan on mc2p_default t1_2
+ Filter: (a = 1)
+(12 rows)
+
+--
+-- pruning with clauses containing <> operator
+--
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+explain (costs off) select * from rp where a <> 1;
+ QUERY PLAN
+--------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: (a <> 1)
+ -> Seq Scan on rp1
+ Filter: (a <> 1)
+ -> Seq Scan on rp2
+ Filter: (a <> 1)
+(7 rows)
+
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+ QUERY PLAN
+-----------------------------------------
+ Append
+ -> Seq Scan on rp0
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp1
+ Filter: ((a <> 1) AND (a <> 2))
+ -> Seq Scan on rp2
+ Filter: ((a <> 1) AND (a <> 2))
+(7 rows)
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on lp_ad
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_bc
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_ef
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_g
+ Filter: (a <> 'a'::bpchar)
+ -> Seq Scan on lp_default
+ Filter: (a <> 'a'::bpchar)
+(11 rows)
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on lp_bc
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_ef
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_g
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_null
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+ -> Seq Scan on lp_default
+ Filter: (((a <> 'a'::bpchar) AND (a <> 'd'::bpchar)) OR (a IS NULL))
+(11 rows)
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on rlp3efgh
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+ -> Seq Scan on rlp3_default
+ Filter: ((b IS NOT NULL) AND ((b)::text <> 'ab'::text) AND ((b)::text <> 'cd'::text) AND ((b)::text <> 'xy'::text) AND (a = 15))
+(5 rows)
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+ -> Seq Scan on coll_pruning_multi3
+ Filter: (substr(a, 1) = 'e'::text COLLATE "C")
+(7 rows)
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi1
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+ -> Seq Scan on coll_pruning_multi2
+ Filter: (substr(a, 1) = 'a'::text COLLATE "POSIX")
+(5 rows)
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+ QUERY PLAN
+---------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coll_pruning_multi2
+ Filter: ((substr(a, 1) = 'e'::text COLLATE "C") AND (substr(a, 1) = 'a'::text COLLATE "POSIX"))
+(3 rows)
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+ QUERY PLAN
+------------------------------------
+ Append
+ -> Seq Scan on like_op_noprune1
+ Filter: (a ~~ '%BC'::text)
+ -> Seq Scan on like_op_noprune2
+ Filter: (a ~~ '%BC'::text)
+(5 rows)
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on rparted_by_int2_maxvalue
+ Filter: (a > '100000000000000'::bigint)
+(3 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d4ef192fcd..ad5177715c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -60,7 +60,7 @@ explain (costs off) select * from rlp where 1 > a; /* commuted */
explain (costs off) select * from rlp where a <= 1;
explain (costs off) select * from rlp where a = 1;
explain (costs off) select * from rlp where a = 1::bigint; /* same as above */
-explain (costs off) select * from rlp where a = 1::numeric; /* only null can be pruned */
+explain (costs off) select * from rlp where a = 1::numeric; /* no pruning */
explain (costs off) select * from rlp where a <= 10;
explain (costs off) select * from rlp where a > 10;
explain (costs off) select * from rlp where a < 15;
@@ -152,4 +152,125 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart;
+--
+-- some more cases
+--
+
+--
+-- pruning for partitioned table appearing inside a sub-query
+--
+-- pruning won't work for mc3p, because some keys are Params
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = t1.b and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+-- pruning should work fine, because values for a prefix of keys (a, b) are
+-- available
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.c = t1.b and abs(t2.b) = 1 and t2.a = 1) s where t1.a = 1;
+
+-- also here, because values for all keys are provided
+explain (costs off) select * from mc2p t1, lateral (select count(*) from mc3p t2 where t2.a = 1 and abs(t2.b) = 1 and t2.c = 1) s where t1.a = 1;
+
+--
+-- pruning with clauses containing <> operator
+--
+
+-- doesn't prune range partitions
+create table rp (a int) partition by range (a);
+create table rp0 partition of rp for values from (minvalue) to (1);
+create table rp1 partition of rp for values from (1) to (2);
+create table rp2 partition of rp for values from (2) to (maxvalue);
+
+explain (costs off) select * from rp where a <> 1;
+explain (costs off) select * from rp where a <> 1 and a <> 2;
+
+-- null partition should be eliminated due to strict <> clause.
+explain (costs off) select * from lp where a <> 'a';
+
+-- ensure we detect contradictions in clauses; a can't be NULL and NOT NULL.
+explain (costs off) select * from lp where a <> 'a' and a is null;
+explain (costs off) select * from lp where (a <> 'a' and a <> 'd') or a is null;
+
+-- check that it also works for a partitioned table that's not root,
+-- which in this case are partitions of rlp that are themselves
+-- list-partitioned on b
+explain (costs off) select * from rlp where a = 15 and b <> 'ab' and b <> 'cd' and b <> 'xy' and b is not null;
+
+--
+-- different collations for different keys with same expression
+--
+create table coll_pruning_multi (a text) partition by range (substr(a, 1) collate "POSIX", substr(a, 1) collate "C");
+create table coll_pruning_multi1 partition of coll_pruning_multi for values from ('a', 'a') to ('a', 'e');
+create table coll_pruning_multi2 partition of coll_pruning_multi for values from ('a', 'e') to ('a', 'z');
+create table coll_pruning_multi3 partition of coll_pruning_multi for values from ('b', 'a') to ('b', 'e');
+
+-- no pruning, because no value for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C";
+
+-- pruning, with a value provided for the leading key
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'a' collate "POSIX";
+
+-- pruning, with values provided for both keys
+explain (costs off) select * from coll_pruning_multi where substr(a, 1) = 'e' collate "C" and substr(a, 1) = 'a' collate "POSIX";
+
+--
+-- LIKE operators don't prune
+--
+create table like_op_noprune (a text) partition by list (a);
+create table like_op_noprune1 partition of like_op_noprune for values in ('ABC');
+create table like_op_noprune2 partition of like_op_noprune for values in ('BCD');
+explain (costs off) select * from like_op_noprune where a like '%BC';
+
+--
+-- tests wherein clause value requires a cross-type comparison function
+--
+create table lparted_by_int2 (a smallint) partition by list (a);
+create table lparted_by_int2_1 partition of lparted_by_int2 for values in (1);
+create table lparted_by_int2_16384 partition of lparted_by_int2 for values in (16384);
+explain (costs off) select * from lparted_by_int2 where a = 100000000000000;
+
+create table rparted_by_int2 (a smallint) partition by range (a);
+create table rparted_by_int2_1 partition of rparted_by_int2 for values from (1) to (10);
+create table rparted_by_int2_16384 partition of rparted_by_int2 for values from (10) to (16384);
+-- all partitions pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values from (16384) to (maxvalue);
+-- all partitions but rparted_by_int2_maxvalue pruned
+explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+
+-- hash partitioning
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
On Sat, Apr 7, 2018 at 1:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's my proposed patch.
Idle thought: how about renaming the "constfalse" argument and variables
to "contradictory" or maybe just "contradict"?
Sounds fine to me.
Thanks,
Amit
Hi Alvaro,
On 04/06/2018 12:41 PM, Alvaro Herrera wrote:
Here's my proposed patch.
Idle thought: how about renaming the "constfalse" argument and variables
to "contradictory" or maybe just "contradict"?
Passes check-world.
New directories, and variable rename seems like a good idea; either is ok.
Best regards,
Jesper
So I pushed this 25 minutes ago, and already there's a couple of
buildfarm members complaining:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka&dt=2018-04-06%2020%3A09%3A52
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-06%2019%3A55%3A07
Both show exactly the same diff in test partition_prune:
*** /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/test/regress/expected/partition_prune.out Fri Apr 6 15:55:08 2018
--- /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/src/test/regress/results/partition_prune.out Fri Apr 6 16:01:40 2018
***************
*** 1348,1357 ****
----------+----+-----
hp0 | |
hp0 | 1 |
! hp0 | 1 | xxx
hp3 | 10 | yyy
! hp1 | | xxx
! hp2 | 10 | xxx
(6 rows)
-- partial keys won't prune, nor would non-equality conditions
--- 1348,1357 ----
----------+----+-----
hp0 | |
hp0 | 1 |
! hp0 | 10 | xxx
! hp3 | | xxx
hp3 | 10 | yyy
! hp2 | 1 | xxx
(6 rows)
-- partial keys won't prune, nor would non-equality conditions
***************
*** 1460,1466 ****
QUERY PLAN
-------------------------------------------------
Append
! -> Seq Scan on hp0
Filter: ((a = 1) AND (b = 'xxx'::text))
(3 rows)
--- 1460,1466 ----
QUERY PLAN
-------------------------------------------------
Append
! -> Seq Scan on hp2
Filter: ((a = 1) AND (b = 'xxx'::text))
(3 rows)
***************
*** 1468,1474 ****
QUERY PLAN
-----------------------------------------------------
Append
! -> Seq Scan on hp1
Filter: ((a IS NULL) AND (b = 'xxx'::text))
(3 rows)
--- 1468,1474 ----
QUERY PLAN
-----------------------------------------------------
Append
! -> Seq Scan on hp3
Filter: ((a IS NULL) AND (b = 'xxx'::text))
(3 rows)
***************
*** 1476,1482 ****
QUERY PLAN
--------------------------------------------------
Append
! -> Seq Scan on hp2
Filter: ((a = 10) AND (b = 'xxx'::text))
(3 rows)
--- 1476,1482 ----
QUERY PLAN
--------------------------------------------------
Append
! -> Seq Scan on hp0
Filter: ((a = 10) AND (b = 'xxx'::text))
(3 rows)
***************
*** 1494,1504 ****
Append
-> Seq Scan on hp0
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp2
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp3
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
! (7 rows)
-- hash partitiong pruning doesn't occur with <> operator clauses
explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
--- 1494,1502 ----
Append
-> Seq Scan on hp0
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp3
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
! (5 rows)
-- hash partitiong pruning doesn't occur with <> operator clauses
explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-04-06 17:28:00 -0300, Alvaro Herrera wrote:
So I pushed this 25 minutes ago, and already there's a couple of
buildfarm members complaining:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka&dt=2018-04-06%2020%3A09%3A52
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-06%2019%3A55%3A07Both show exactly the same diff in test partition_prune:
There's also
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-04-06%2020%3A45%3A01
*** /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/expected/misc.out 2018-02-20 18:45:02.068665297 -0800
--- /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/results/misc.out 2018-04-06 13:55:50.718253850 -0700
***************
*** 32,40 ****
(6 rows)
SELECT * FROM t1p WHERE o > 50 AND p like '%64%';
- LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)"
- LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)"
- LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)"
LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table name="public.t1p"
LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table t1p column o"
LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table t1p column p"
--- 32,37 ----
seems you just need to remove those rows from the expected file.
- Andres
On Fri, Apr 6, 2018 at 8:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I don't actually like very much the idea of putting all this code in
optimizer/util. This morning it occurred to me that we should create a new
src/backend/partitioning/ (and a src/include/partitioning/ to go with
it) and drop a bunch of files there. Even your proposed new partcache.c
will seem misplaced *anywhere*, since it contains support code to be
used by both planner and executor; in src/{backend,include}/partitioning
it will be able to serve both without it being a modularity wart.
Uh, what?
Surely partcache.c is correctly placed next to relcache.c and
syscache.c and everything else in src/backend/utils/cache.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas wrote:
On Fri, Apr 6, 2018 at 8:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I don't actually like very much the idea of putting all this code in
optimizer/util. This morning it occurred to me that we should create a new
src/backend/partitioning/ (and a src/include/partitioning/ to go with
it) and drop a bunch of files there. Even your proposed new partcache.c
will seem misplaced *anywhere*, since it contains support code to be
used by both planner and executor; in src/{backend,include}/partitioning
it will be able to serve both without it being a modularity wart.Uh, what?
Surely partcache.c is correctly placed next to relcache.c and
syscache.c and everything else in src/backend/utils/cache.
Frankly, I'm not real sure about partcache.c yet. Are you?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thank you Alvaro for rest of the cleanup and committing.
On Sat, Apr 7, 2018 at 5:28 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
So I pushed this 25 minutes ago, and already there's a couple of
buildfarm members complaining:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka&dt=2018-04-06%2020%3A09%3A52
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-06%2019%3A55%3A07Both show exactly the same diff in test partition_prune:
*** /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/test/regress/expected/partition_prune.out Fri Apr 6 15:55:08 2018 --- /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/src/test/regress/results/partition_prune.out Fri Apr 6 16:01:40 2018 *************** *** 1348,1357 **** ----------+----+----- hp0 | | hp0 | 1 | ! hp0 | 1 | xxx hp3 | 10 | yyy ! hp1 | | xxx ! hp2 | 10 | xxx (6 rows)-- partial keys won't prune, nor would non-equality conditions --- 1348,1357 ---- ----------+----+----- hp0 | | hp0 | 1 | ! hp0 | 10 | xxx ! hp3 | | xxx hp3 | 10 | yyy ! hp2 | 1 | xxx (6 rows)-- partial keys won't prune, nor would non-equality conditions
***************
*** 1460,1466 ****
QUERY PLAN
-------------------------------------------------
Append
! -> Seq Scan on hp0
Filter: ((a = 1) AND (b = 'xxx'::text))
(3 rows)--- 1460,1466 ---- QUERY PLAN ------------------------------------------------- Append ! -> Seq Scan on hp2 Filter: ((a = 1) AND (b = 'xxx'::text)) (3 rows)***************
*** 1468,1474 ****
QUERY PLAN
-----------------------------------------------------
Append
! -> Seq Scan on hp1
Filter: ((a IS NULL) AND (b = 'xxx'::text))
(3 rows)--- 1468,1474 ---- QUERY PLAN ----------------------------------------------------- Append ! -> Seq Scan on hp3 Filter: ((a IS NULL) AND (b = 'xxx'::text)) (3 rows)***************
*** 1476,1482 ****
QUERY PLAN
--------------------------------------------------
Append
! -> Seq Scan on hp2
Filter: ((a = 10) AND (b = 'xxx'::text))
(3 rows)--- 1476,1482 ---- QUERY PLAN -------------------------------------------------- Append ! -> Seq Scan on hp0 Filter: ((a = 10) AND (b = 'xxx'::text)) (3 rows)***************
*** 1494,1504 ****
Append
-> Seq Scan on hp0
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp2
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-> Seq Scan on hp3
Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
! (7 rows)-- hash partitiong pruning doesn't occur with <> operator clauses explain (costs off) select * from hp where a <> 1 and b <> 'xxx'; --- 1494,1502 ---- Append -> Seq Scan on hp0 Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) -> Seq Scan on hp3 Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL))) ! (5 rows)
So this same failure occurs on (noting the architecture):
ppc64:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka&dt=2018-04-06%2020%3A09%3A52
ia64:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2018-04-06%2022%3A32%3A24
ppc64 (POWER7):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2018-04-06%2022%3A58%3A13
ppc64 (POWER7):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-04-06%2023%3A02%3A13
powerpc:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2018-04-06%2023%3A05%3A08
powerpc:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2018-04-06%2023%3A13%3A23
powerpc 32-bit userspace on ppc64 host:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-06%2023%3A40%3A07
Seems to be due to that the hashing function used in partitioning
gives different answer for a given set of partition key values than
others.
Thanks,
Amit
On 7 April 2018 at 12:35, Amit Langote <amitlangote09@gmail.com> wrote:
Thank you Alvaro for rest of the cleanup and committing.
+10!
So this same failure occurs on (noting the architecture):
ppc64:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quokka&dt=2018-04-06%2020%3A09%3A52ia64:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2018-04-06%2022%3A32%3A24ppc64 (POWER7):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2018-04-06%2022%3A58%3A13ppc64 (POWER7):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-04-06%2023%3A02%3A13powerpc:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2018-04-06%2023%3A05%3A08powerpc:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2018-04-06%2023%3A13%3A23powerpc 32-bit userspace on ppc64 host:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-06%2023%3A40%3A07Seems to be due to that the hashing function used in partitioning
gives different answer for a given set of partition key values than
others.
They all look like bigendian CPUs.
https://en.wikipedia.org/wiki/Comparison_of_instruction_set_architectures#Endianness
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 7 April 2018 at 12:43, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 7 April 2018 at 12:35, Amit Langote <amitlangote09@gmail.com> wrote:
So this same failure occurs on (noting the architecture):
Seems to be due to that the hashing function used in partitioning
gives different answer for a given set of partition key values than
others.They all look like bigendian CPUs.
I looked at all the regression test diffs for each of the servers you
mentioned and I verified that the diffs match on each of the 7
servers.
Maybe the best solution is to pull those tests out of
partition_prune.sql then create partition_prune_hash and just have an
alternative .out file with the partitions which match on bigendian
machines.
We could also keep them in the same file, but that's a much bigger
alternative file to maintain and more likely to get broken if someone
forgets to update it.
What do you think?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Apr 7, 2018 at 10:31 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 7 April 2018 at 12:43, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 7 April 2018 at 12:35, Amit Langote <amitlangote09@gmail.com> wrote:
So this same failure occurs on (noting the architecture):
Seems to be due to that the hashing function used in partitioning
gives different answer for a given set of partition key values than
others.They all look like bigendian CPUs.
I looked at all the regression test diffs for each of the servers you
mentioned and I verified that the diffs match on each of the 7
servers.Maybe the best solution is to pull those tests out of
partition_prune.sql then create partition_prune_hash and just have an
alternative .out file with the partitions which match on bigendian
machines.We could also keep them in the same file, but that's a much bigger
alternative file to maintain and more likely to get broken if someone
forgets to update it.What do you think?
Yeah, that's an idea.
Is it alright though that same data may end up in different hash
partitions depending on the architecture? IIRC, that's the way we
decided to go when using hash partitioning, but it would've been
clearer if there was already some evidence in regression tests that
that's what we've chosen, such as, some existing tests for tuple
routing.
Thanks,
Amit
On 7 April 2018 at 13:50, Amit Langote <amitlangote09@gmail.com> wrote:
On Sat, Apr 7, 2018 at 10:31 AM, David Rowley
I looked at all the regression test diffs for each of the servers you
mentioned and I verified that the diffs match on each of the 7
servers.Maybe the best solution is to pull those tests out of
partition_prune.sql then create partition_prune_hash and just have an
alternative .out file with the partitions which match on bigendian
machines.We could also keep them in the same file, but that's a much bigger
alternative file to maintain and more likely to get broken if someone
forgets to update it.What do you think?
Yeah, that's an idea.
Is it alright though that same data may end up in different hash
partitions depending on the architecture? IIRC, that's the way we
decided to go when using hash partitioning, but it would've been
clearer if there was already some evidence in regression tests that
that's what we've chosen, such as, some existing tests for tuple
routing.
The only alternative would be to change all the hash functions so that
they normalise their endianness. It does not sound like something that
will perform very well. Plus it would break everyone's hash indexes on
a pg_upgrade.
pg_basebackups can't be transferred over to other architectures
anyway, so I'm not so worried about tuples being routed to other
partitions.
Maybe someone else can see a reason why this is bad?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 7 April 2018 at 13:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
Maybe the best solution is to pull those tests out of
partition_prune.sql then create partition_prune_hash and just have an
alternative .out file with the partitions which match on bigendian
machines.
Here's 1 of 2. I thought it was best to get the buildfarm green again
as soon as possible.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Remove-HASH-partition-pruning-tests.patchapplication/octet-stream; name=0001-Remove-HASH-partition-pruning-tests.patchDownload
From 70a2265495e3d00d481d3fa7340102ae962c1edf Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Sat, 7 Apr 2018 14:40:41 +1200
Subject: [PATCH] Remove HASH partition pruning tests
A value produced by a hash function can vary on different CPU architectures.
This means that the partition which a tuple ends up in a hashed partitioned
table can vary too. This results in varied matching partitions being found
during partition pruning for hash partitioned tables.
This commit removes these HASH partitioned table tests. We'll put these back
again in a separate file in a follow-up commit.
---
src/test/regress/expected/partition_prune.out | 185 --------------------------
src/test/regress/sql/partition_prune.sql | 37 ------
2 files changed, 222 deletions(-)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 2d77b3edd4..69d541eff4 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1331,188 +1331,3 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
(3 rows)
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--- hash partitioning
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
- tableoid | a | b
-----------+----+-----
- hp0 | |
- hp0 | 1 |
- hp0 | 1 | xxx
- hp3 | 10 | yyy
- hp1 | | xxx
- hp2 | 10 | xxx
-(6 rows)
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
- QUERY PLAN
--------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a = 1)
- -> Seq Scan on hp1
- Filter: (a = 1)
- -> Seq Scan on hp2
- Filter: (a = 1)
- -> Seq Scan on hp3
- Filter: (a = 1)
-(9 rows)
-
-explain (costs off) select * from hp where b = 'xxx';
- QUERY PLAN
------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp1
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp2
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp3
- Filter: (b = 'xxx'::text)
-(9 rows)
-
-explain (costs off) select * from hp where a is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a IS NULL)
- -> Seq Scan on hp1
- Filter: (a IS NULL)
- -> Seq Scan on hp2
- Filter: (a IS NULL)
- -> Seq Scan on hp3
- Filter: (a IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where b is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b IS NULL)
- -> Seq Scan on hp1
- Filter: (b IS NULL)
- -> Seq Scan on hp2
- Filter: (b IS NULL)
- -> Seq Scan on hp3
- Filter: (b IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a < 1) AND (b = 'xxx'::text))
-(9 rows)
-
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b = 'yyy'::text))
-(9 rows)
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
- QUERY PLAN
------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a IS NULL) AND (b IS NULL))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b is null;
- QUERY PLAN
--------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((b IS NULL) AND (a = 1))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a is null and b = 'xxx';
- QUERY PLAN
------------------------------------------------------
- Append
- -> Seq Scan on hp1
- Filter: ((a IS NULL) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp2
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'yyy'::text))
-(3 rows)
-
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp2
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp3
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-(7 rows)
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
- QUERY PLAN
----------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
-(9 rows)
-
-drop table hp;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index ad5177715c..d5ca3cb702 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -237,40 +237,3 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
-
--- hash partitioning
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
-explain (costs off) select * from hp where b = 'xxx';
-explain (costs off) select * from hp where a is null;
-explain (costs off) select * from hp where b is null;
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
-explain (costs off) select * from hp where a = 1 and b is null;
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
-explain (costs off) select * from hp where a is null and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
-
-drop table hp;
--
2.16.2.windows.1
On 2018-04-07 14:42:53 +1200, David Rowley wrote:
On 7 April 2018 at 13:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
Maybe the best solution is to pull those tests out of
partition_prune.sql then create partition_prune_hash and just have an
alternative .out file with the partitions which match on bigendian
machines.Here's 1 of 2. I thought it was best to get the buildfarm green again
as soon as possible.
Do you have an estimate how long it'll take you to produce patch 2? It'd
be cool to get this covered again soon. If you don't have access to a
big endian machine, we can construct the output from the buildfarm... So
pulling the tests out would be the only "urgent" thing, I can go on from
there.
- Andres
On Sat, Apr 7, 2018 at 7:25 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 7 April 2018 at 13:50, Amit Langote <amitlangote09@gmail.com> wrote:
On Sat, Apr 7, 2018 at 10:31 AM, David Rowley
I looked at all the regression test diffs for each of the servers you
mentioned and I verified that the diffs match on each of the 7
servers.Maybe the best solution is to pull those tests out of
partition_prune.sql then create partition_prune_hash and just have an
alternative .out file with the partitions which match on bigendian
machines.We could also keep them in the same file, but that's a much bigger
alternative file to maintain and more likely to get broken if someone
forgets to update it.What do you think?
Yeah, that's an idea.
Is it alright though that same data may end up in different hash
partitions depending on the architecture? IIRC, that's the way we
decided to go when using hash partitioning, but it would've been
clearer if there was already some evidence in regression tests that
that's what we've chosen, such as, some existing tests for tuple
routing.The only alternative would be to change all the hash functions so that
they normalise their endianness. It does not sound like something that
will perform very well. Plus it would break everyone's hash indexes on
a pg_upgrade.pg_basebackups can't be transferred over to other architectures
anyway, so I'm not so worried about tuples being routed to other
partitions.Maybe someone else can see a reason why this is bad?
I don't think the concept is bad by itself. That's expected, in fact,
we have added an option to pg_dump (dump through parent or some such)
to handle exactly this case. What Amit seems to be complaining though
is the regression test. We need to write regression tests so that they
produce the same plans, pruning same partitions by name, on all
architectures.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 7 April 2018 at 15:00, Andres Freund <andres@anarazel.de> wrote:
On 2018-04-07 14:42:53 +1200, David Rowley wrote:
On 7 April 2018 at 13:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
Maybe the best solution is to pull those tests out of
partition_prune.sql then create partition_prune_hash and just have an
alternative .out file with the partitions which match on bigendian
machines.Here's 1 of 2. I thought it was best to get the buildfarm green again
as soon as possible.Do you have an estimate how long it'll take you to produce patch 2? It'd
be cool to get this covered again soon. If you don't have access to a
big endian machine, we can construct the output from the buildfarm... So
pulling the tests out would be the only "urgent" thing, I can go on from
there.
Attached.
I've not tested on a bigendian machine, but the diff -c between the
two output files match the diff on the failing buildfarm members.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0002-Add-HASH-partition-pruning-tests.patchapplication/octet-stream; name=0002-Add-HASH-partition-pruning-tests.patchDownload
From dcbee09fc3c43e84626cd26ddd7c5c1136f490b1 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Sat, 7 Apr 2018 15:02:24 +1200
Subject: [PATCH 2/2] Add HASH partition pruning tests
Two output files must exist as a machines endianness will control which
partitions match.
---
src/test/regress/expected/partition_prune_hash.out | 189 +++++++++++++++++++++
.../regress/expected/partition_prune_hash_1.out | 187 ++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/partition_prune_hash.sql | 41 +++++
5 files changed, 419 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/partition_prune_hash.out
create mode 100644 src/test/regress/expected/partition_prune_hash_1.out
create mode 100644 src/test/regress/sql/partition_prune_hash.sql
diff --git a/src/test/regress/expected/partition_prune_hash.out b/src/test/regress/expected/partition_prune_hash.out
new file mode 100644
index 0000000000..fbba3f1ff8
--- /dev/null
+++ b/src/test/regress/expected/partition_prune_hash.out
@@ -0,0 +1,189 @@
+--
+-- Test Partition pruning for HASH partitioning
+-- We keep this as a seperate test as hash functions return
+-- values will vary based on CPU architecture.
+--
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 1 | xxx
+ hp3 | 10 | yyy
+ hp1 | | xxx
+ hp2 | 10 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/expected/partition_prune_hash_1.out b/src/test/regress/expected/partition_prune_hash_1.out
new file mode 100644
index 0000000000..4a26a0e277
--- /dev/null
+++ b/src/test/regress/expected/partition_prune_hash_1.out
@@ -0,0 +1,187 @@
+--
+-- Test Partition pruning for HASH partitioning
+-- We keep this as a seperate test as hash functions return
+-- values will vary based on CPU architecture.
+--
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+----+-----
+ hp0 | |
+ hp0 | 1 |
+ hp0 | 10 | xxx
+ hp3 | | xxx
+ hp3 | 10 | yyy
+ hp2 | 1 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 10) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 10) AND (b = 'yyy'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(5 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table hp;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 20d6745730..00c324dd44 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -116,7 +116,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid c
# ----------
# Another group of parallel tests
# ----------
-test: identity partition_join partition_prune reloptions hash_part indexing partition_aggregate fast_default
+test: identity partition_join partition_prune partition_prune_hash reloptions hash_part indexing partition_aggregate fast_default
# event triggers cannot run concurrently with any test that runs DDL
test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index a08169f256..39c3fa9c85 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -184,6 +184,7 @@ test: xml
test: identity
test: partition_join
test: partition_prune
+test: partition_prune_hash
test: reloptions
test: hash_part
test: indexing
diff --git a/src/test/regress/sql/partition_prune_hash.sql b/src/test/regress/sql/partition_prune_hash.sql
new file mode 100644
index 0000000000..fd1783bf53
--- /dev/null
+++ b/src/test/regress/sql/partition_prune_hash.sql
@@ -0,0 +1,41 @@
+--
+-- Test Partition pruning for HASH partitioning
+-- We keep this as a seperate test as hash functions return
+-- values will vary based on CPU architecture.
+--
+
+create table hp (a int, b text) partition by hash (a, b);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (10, 'xxx');
+insert into hp values (10, 'yyy');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'xxx';
+explain (costs off) select * from hp where a = 10 and b = 'yyy';
+explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table hp;
--
2.16.2.windows.1
On 7 April 2018 at 15:03, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
On Sat, Apr 7, 2018 at 7:25 AM, David Rowley
The only alternative would be to change all the hash functions so that
they normalise their endianness. It does not sound like something that
will perform very well. Plus it would break everyone's hash indexes on
a pg_upgrade.pg_basebackups can't be transferred over to other architectures
anyway, so I'm not so worried about tuples being routed to other
partitions.Maybe someone else can see a reason why this is bad?
I don't think the concept is bad by itself. That's expected, in fact,
we have added an option to pg_dump (dump through parent or some such)
to handle exactly this case. What Amit seems to be complaining though
is the regression test. We need to write regression tests so that they
produce the same plans, pruning same partitions by name, on all
architectures.
Why is writing tests that produce the same output required?
We have many tests with alternative outputs. Look in
src/tests/regress/expected for files matching _1.out
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Apr 7, 2018 at 8:37 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 7 April 2018 at 15:03, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:On Sat, Apr 7, 2018 at 7:25 AM, David Rowley
The only alternative would be to change all the hash functions so that
they normalise their endianness. It does not sound like something that
will perform very well. Plus it would break everyone's hash indexes on
a pg_upgrade.pg_basebackups can't be transferred over to other architectures
anyway, so I'm not so worried about tuples being routed to other
partitions.Maybe someone else can see a reason why this is bad?
I don't think the concept is bad by itself. That's expected, in fact,
we have added an option to pg_dump (dump through parent or some such)
to handle exactly this case. What Amit seems to be complaining though
is the regression test. We need to write regression tests so that they
produce the same plans, pruning same partitions by name, on all
architectures.Why is writing tests that produce the same output required?
We have many tests with alternative outputs. Look in
src/tests/regress/expected for files matching _1.out
That's true, but we usually add such alternative output when we know
all the variants possible as long as "all the variants" do not cover
everything possible. AFAIU, that's not true here. Also, on a given
machine a particular row is guaranteed to fall in a given partition.
On a different machine it will fall in some other partition, but
always that partition on that machine. We don't have a way to select
alternate output based on the architecture. May be a better idea is to
use .source file, creating .out on the fly based on the architecture
of machine like testing the hash output for a given value to decide
which partition it will fall into and then crafting .out with that
partition's name.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 2018-04-07 15:04:37 +1200, David Rowley wrote:
On 7 April 2018 at 15:00, Andres Freund <andres@anarazel.de> wrote:
On 2018-04-07 14:42:53 +1200, David Rowley wrote:
On 7 April 2018 at 13:31, David Rowley <david.rowley@2ndquadrant.com> wrote:
Maybe the best solution is to pull those tests out of
partition_prune.sql then create partition_prune_hash and just have an
alternative .out file with the partitions which match on bigendian
machines.Here's 1 of 2. I thought it was best to get the buildfarm green again
as soon as possible.Do you have an estimate how long it'll take you to produce patch 2? It'd
be cool to get this covered again soon. If you don't have access to a
big endian machine, we can construct the output from the buildfarm... So
pulling the tests out would be the only "urgent" thing, I can go on from
there.Attached.
I've not tested on a bigendian machine, but the diff -c between the
two output files match the diff on the failing buildfarm members.
I've pushed the two patches (collapsed). Trying to get the BF green-ish
again...
- Andres
On 7 April 2018 at 15:18, Andres Freund <andres@anarazel.de> wrote:
I've pushed the two patches (collapsed). Trying to get the BF green-ish
again...
Thanks!
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 7 April 2018 at 15:14, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
On Sat, Apr 7, 2018 at 8:37 AM, David Rowley
Why is writing tests that produce the same output required?
We have many tests with alternative outputs. Look in
src/tests/regress/expected for files matching _1.outThat's true, but we usually add such alternative output when we know
all the variants possible as long as "all the variants" do not cover
everything possible. AFAIU, that's not true here. Also, on a given
machine a particular row is guaranteed to fall in a given partition.
On a different machine it will fall in some other partition, but
always that partition on that machine. We don't have a way to select
alternate output based on the architecture. May be a better idea is to
use .source file, creating .out on the fly based on the architecture
of machine like testing the hash output for a given value to decide
which partition it will fall into and then crafting .out with that
partition's name.
Sounds like you're saying that if we have too many alternative files
then there's a chance that one could pass by luck.
Maybe we can just back up what's just been committed by actually
executing the queries and ensuring that all rows that made it into the
table make it back out again.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Apr 7, 2018 at 8:55 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 7 April 2018 at 15:14, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:On Sat, Apr 7, 2018 at 8:37 AM, David Rowley
Why is writing tests that produce the same output required?
We have many tests with alternative outputs. Look in
src/tests/regress/expected for files matching _1.outThat's true, but we usually add such alternative output when we know
all the variants possible as long as "all the variants" do not cover
everything possible. AFAIU, that's not true here. Also, on a given
machine a particular row is guaranteed to fall in a given partition.
On a different machine it will fall in some other partition, but
always that partition on that machine. We don't have a way to select
alternate output based on the architecture. May be a better idea is to
use .source file, creating .out on the fly based on the architecture
of machine like testing the hash output for a given value to decide
which partition it will fall into and then crafting .out with that
partition's name.Sounds like you're saying that if we have too many alternative files
then there's a chance that one could pass by luck.
Yes.
Maybe we can just back up what's just been committed by actually
executing the queries and ensuring that all rows that made it into the
table make it back out again.
That's one way. But how would we make sure that they landed in proper
partition. Actually we do not know what's proper partition for a given
architecture. And how would we make sure that all rows with the same
partition key land in the same partition. That's why I am suggesting
to calculate the hash value, take modulo and craft the name of
partition where corresponding row will land on a given architecture.
That way, we are sure that the tuple routing logic is correct and also
the partition pruning logic.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
David Rowley <david.rowley@2ndquadrant.com> writes:
Sounds like you're saying that if we have too many alternative files
then there's a chance that one could pass by luck.
Yeah, exactly: it passed, but did it pass for the right reason?
If there's just two expected-files, it's likely not a big problem,
but if you have a bunch it's something to worry about.
I'm also wondering how come we had hash partitioning before and
did not have this sort of problem. Is it just that we added a
new test that's more sensitive to the details of the hashing
(if so, could it be made less so)? Or is there actually more
platform dependence now than before (and if so, why is that)?
regards, tom lane
On 7 April 2018 at 15:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm also wondering how come we had hash partitioning before and
did not have this sort of problem. Is it just that we added a
new test that's more sensitive to the details of the hashing
(if so, could it be made less so)? Or is there actually more
platform dependence now than before (and if so, why is that)?
We didn't prune HASH partitions before today. They were just all returned.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On 2018-04-06 23:41:22 -0400, Tom Lane wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
Sounds like you're saying that if we have too many alternative files
then there's a chance that one could pass by luck.Yeah, exactly: it passed, but did it pass for the right reason?
If there's just two expected-files, it's likely not a big problem,
but if you have a bunch it's something to worry about.
There should be only two alternatives, given our current hashing
implementation, right?
Greetings,
Andres Freund
On 7 April 2018 at 15:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
Sounds like you're saying that if we have too many alternative files
then there's a chance that one could pass by luck.Yeah, exactly: it passed, but did it pass for the right reason?
If there's just two expected-files, it's likely not a big problem,
but if you have a bunch it's something to worry about.
Right, I suggest we wait and see if all members go green again as a
result of 40e42e1024c, and if they're happy then we could maybe leave
it as is with the 2 alternatives output files.
If there are some other variations that crop up, then we can think
harder about what we can do to improve the coverage.
I don't particularly think it matters which hash partition a tuple
goes into, as long as the hash function spreads the values out enough
and most importantly, the pruning code looks for the tuple in the
partition that it was actually inserted into in the first place.
Obviously, we also want to ensure we never do anything which would
change the matching partition in either minor or major version
upgrades too.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On 2018-04-07 15:49:54 +1200, David Rowley wrote:
Right, I suggest we wait and see if all members go green again as a
result of 40e42e1024c, and if they're happy then we could maybe leave
it as is with the 2 alternatives output files.
At least the first previously borked animal came back green (termite).
I don't particularly think it matters which hash partition a tuple
goes into, as long as the hash function spreads the values out enough
and most importantly, the pruning code looks for the tuple in the
partition that it was actually inserted into in the first place.
Obviously, we also want to ensure we never do anything which would
change the matching partition in either minor or major version
upgrades too.
+1
I've also attempted to fix rhinoceros's failure I remarked upon a couple
hours ago in
/messages/by-id/20180406210330.wmqw42wqgiicktli@alap3.anarazel.de
Greetings,
Andres Freund
On 7 April 2018 at 09:03, Andres Freund <andres@anarazel.de> wrote:
There's also https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2018-04-06%2020%3A45%3A01 *** /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/expected/misc.out 2018-02-20 18:45:02.068665297 -0800 --- /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/contrib/sepgsql/results/misc.out 2018-04-06 13:55:50.718253850 -0700 *************** *** 32,40 **** (6 rows)SELECT * FROM t1p WHERE o > 50 AND p like '%64%'; - LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)" - LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)" - LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)" LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table name="public.t1p" LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table t1p column o" LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table t1p column p" --- 32,37 ----seems you just need to remove those rows from the expected file.
Agreed.
Patch attached.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
part_prune_sepgsql_fix.patchapplication/octet-stream; name=part_prune_sepgsql_fix.patchDownload
diff --git a/contrib/sepgsql/expected/misc.out b/contrib/sepgsql/expected/misc.out
index fdf07298bb..32b3bb4f58 100644
--- a/contrib/sepgsql/expected/misc.out
+++ b/contrib/sepgsql/expected/misc.out
@@ -32,9 +32,6 @@ LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_re
(6 rows)
SELECT * FROM t1p WHERE o > 50 AND p like '%64%';
-LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)"
-LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)"
-LOG: SELinux: allowed { execute } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure name="pg_catalog.int4le(integer,integer)"
LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_table name="public.t1p"
LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table t1p column o"
LOG: SELinux: allowed { select } scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 tcontext=unconfined_u:object_r:sepgsql_table_t:s0 tclass=db_column name="table t1p column p"
On 7 April 2018 at 16:09, Andres Freund <andres@anarazel.de> wrote:
I've also attempted to fix rhinoceros's failure I remarked upon a couple
hours ago in
/messages/by-id/20180406210330.wmqw42wqgiicktli@alap3.anarazel.de
Oh, thanks!
I had just been looking at that too...
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 7 April 2018 at 15:18, Andres Freund <andres@anarazel.de> wrote:
I've pushed the two patches (collapsed). Trying to get the BF green-ish
again...
termite has now gone green.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Apr 7, 2018 at 1:09 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-04-07 15:49:54 +1200, David Rowley wrote:
Right, I suggest we wait and see if all members go green again as a
result of 40e42e1024c, and if they're happy then we could maybe leave
it as is with the 2 alternatives output files.At least the first previously borked animal came back green (termite).
I don't particularly think it matters which hash partition a tuple
goes into, as long as the hash function spreads the values out enough
and most importantly, the pruning code looks for the tuple in the
partition that it was actually inserted into in the first place.
Obviously, we also want to ensure we never do anything which would
change the matching partition in either minor or major version
upgrades too.+1
+1
Given that the difference only appeared on animals that David pointed
out have big-endian architecture, it seems we'd only need two output
files. It does seem true that the extended hashing functions that
were adding to support partitioning would somehow be affected by
endianness.
Thank you David for creating the patches and Andres for committing it.
Buildfarm seems to be turning green where it had gone red due to the
hashing differences.
I've also attempted to fix rhinoceros's failure I remarked upon a couple
hours ago in
/messages/by-id/20180406210330.wmqw42wqgiicktli@alap3.anarazel.de
Thanks Andres.
Regards,
Amit
Amit Langote <amitlangote09@gmail.com> writes:
Given that the difference only appeared on animals that David pointed
out have big-endian architecture, it seems we'd only need two output
files.
Dunno, I'm wondering whether 32 vs 64 bit will make a difference.
regards, tom lane
On Sat, Apr 7, 2018 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
Given that the difference only appeared on animals that David pointed
out have big-endian architecture, it seems we'd only need two output
files.Dunno, I'm wondering whether 32 vs 64 bit will make a difference.
There was one 32-bit animal in the failing set, which apparently
produces the same hashes as others (allegedly due to endianness
difference).
powerpc 32-bit userspace on ppc64 host:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-06%2023%3A40%3A07
...and it has turned green since the alternative outputs fix went in.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-07%2004%3A06%3A09
Thanks,
Amit
Andres Freund wrote:
Hi,
On 2018-04-07 15:49:54 +1200, David Rowley wrote:
Right, I suggest we wait and see if all members go green again as a
result of 40e42e1024c, and if they're happy then we could maybe leave
it as is with the 2 alternatives output files.At least the first previously borked animal came back green (termite).
Thanks everyone for addressing this.
I've also attempted to fix rhinoceros's failure I remarked upon a couple
hours ago in
/messages/by-id/20180406210330.wmqw42wqgiicktli@alap3.anarazel.de
And this, too. I was unsure if this was because we were missing calling
some object access hook from the new code, or the additional pruning.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-04-07 08:13:23 -0300, Alvaro Herrera wrote:
Andres Freund wrote:
I've also attempted to fix rhinoceros's failure I remarked upon a couple
hours ago in
/messages/by-id/20180406210330.wmqw42wqgiicktli@alap3.anarazel.deAnd this, too. I was unsure if this was because we were missing calling
some object access hook from the new code, or the additional pruning.
That's possible. I did attempt to skim the code, that's where my
complain about the docs originated. There certainly isn't an
InvokeFunctionExecuteHook() present. It's not clear to me whether
that's an issue - we don't invoke the hooks in a significant number of
places either, and as far as I can discern there's not much rule or
reason about where we invoke it.
Greetings,
Andres Freund
Andres Freund wrote:
On 2018-04-07 08:13:23 -0300, Alvaro Herrera wrote:
Andres Freund wrote:
I've also attempted to fix rhinoceros's failure I remarked upon a couple
hours ago in
/messages/by-id/20180406210330.wmqw42wqgiicktli@alap3.anarazel.deAnd this, too. I was unsure if this was because we were missing calling
some object access hook from the new code, or the additional pruning.That's possible. I did attempt to skim the code, that's where my
complain about the docs originated. There certainly isn't an
InvokeFunctionExecuteHook() present. It's not clear to me whether
that's an issue - we don't invoke the hooks in a significant number of
places either, and as far as I can discern there's not much rule or
reason about where we invoke it.
I managed to convince myself that it's not higher-level code's
responsibility to invoke the execute hooks; the likelihood of bugs of
omission seems just too large. But maybe I'm wrong.
There's a small number of InvokeFunctionExecuteHook calls in the
executor, but I really doubt that it exhaustively covers everyplace
where catalogued functions are called in the executor.
CC'ing KaiGai and Stephen Frost; they may want to chip in here.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
While looking at the docs in [1]https://www.postgresql.org/docs/10/static/ddl-partitioning.html, I saw that we still mention:
4. Ensure that the constraint_exclusion configuration parameter is not
disabled in postgresql.conf. If it is, queries will not be optimized
as desired.
This is no longer true. The attached patch removed it.
[1]: https://www.postgresql.org/docs/10/static/ddl-partitioning.html
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Remove-mention-of-constraint_exclusion-in-partitioni.patchapplication/octet-stream; name=0001-Remove-mention-of-constraint_exclusion-in-partitioni.patchDownload
From 483419a3c395413491747f99c66af487a3ebc8fb Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Mon, 9 Apr 2018 15:43:32 +1200
Subject: [PATCH] Remove mention of constraint_exclusion in partitioning docs
As of 9fdb675fc5d2, this GUC now no longer has an affect on partition pruning.
---
doc/src/sgml/ddl.sgml | 8 --------
1 file changed, 8 deletions(-)
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index feb2ab7792..86b71f0e29 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3193,14 +3193,6 @@ CREATE INDEX ON measurement (logdate);
</programlisting>
</para>
</listitem>
-
- <listitem>
- <para>
- Ensure that the <xref linkend="guc-constraint-exclusion"/>
- configuration parameter is not disabled in <filename>postgresql.conf</filename>.
- If it is, queries will not be optimized as desired.
- </para>
- </listitem>
</orderedlist>
</para>
--
2.16.2.windows.1
Hi David.
On 2018/04/09 12:48, David Rowley wrote:
While looking at the docs in [1], I saw that we still mention:
4. Ensure that the constraint_exclusion configuration parameter is not
disabled in postgresql.conf. If it is, queries will not be optimized
as desired.This is no longer true. The attached patch removed it.
[1] https://www.postgresql.org/docs/10/static/ddl-partitioning.htm
Thanks. I was aware of the changes that would need to be made, but you
beat me to writing the patch itself.
About the patch:
While the users no longer need to enable constraint_exclusion true for
select queries, one would still need it for update/delete queries, because
the new pruning logic only gets invoked for the former. Alas...
Also, further down on that page, there is a 5.10.4 Partitioning and
Constraint Exclusion sub-section. I think it would also need some tweaks
due to new developments.
I updated your patch to fix that. Please take a look.
Thanks,
Amit
Attachments:
v2-0001-Remove-mention-of-constraint_exclusion-in-partiti.patchtext/plain; charset=UTF-8; name=v2-0001-Remove-mention-of-constraint_exclusion-in-partiti.patchDownload
From a4fe924936fe623ff95e6aa050b8fd7d22dbbb84 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Mon, 9 Apr 2018 15:43:32 +1200
Subject: [PATCH v2] Remove mention of constraint_exclusion in partitioning
docs
As of 9fdb675fc5d2, this GUC now no longer has an affect on partition pruning.
---
doc/src/sgml/ddl.sgml | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index feb2ab7792..eed8753e24 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3194,13 +3194,14 @@ CREATE INDEX ON measurement (logdate);
</para>
</listitem>
- <listitem>
- <para>
- Ensure that the <xref linkend="guc-constraint-exclusion"/>
- configuration parameter is not disabled in <filename>postgresql.conf</filename>.
- If it is, queries will not be optimized as desired.
- </para>
- </listitem>
+ <listitem>
+ <para>
+ Ensure that the <xref linkend="guc-constraint-exclusion"/>
+ configuration parameter is not disabled in <filename>postgresql.conf</filename>.
+ While enabling it is not required for select queries, not doing so will result
+ in update and delete queries to not be optimized as desired.
+ </para>
+ </listitem>
</orderedlist>
</para>
@@ -3767,10 +3768,12 @@ ANALYZE measurement;
</indexterm>
<para>
- <firstterm>Constraint exclusion</firstterm> is a query optimization technique
- that improves performance for partitioned tables defined in the
- fashion described above (both declaratively partitioned tables and those
- implemented using inheritance). As an example:
+ <firstterm>Constraint exclusion</firstterm> is a query optimization
+ technique that improves performance for partitioned tables defined in the
+ fashion described above. While it is used only for update and delete
+ queries in the case of declaratively partitioned tables, it is used for all
+ queries in the case of table partitioning implemented using inheritance.
+ As an example:
<programlisting>
SET constraint_exclusion = on;
--
2.11.0
On Fri, Apr 6, 2018 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
Sounds like you're saying that if we have too many alternative files
then there's a chance that one could pass by luck.Yeah, exactly: it passed, but did it pass for the right reason?
If there's just two expected-files, it's likely not a big problem,
but if you have a bunch it's something to worry about.I'm also wondering how come we had hash partitioning before and
did not have this sort of problem. Is it just that we added a
new test that's more sensitive to the details of the hashing
(if so, could it be made less so)? Or is there actually more
platform dependence now than before (and if so, why is that)?
The existing hash partitioning tests did have some dependencies on the
hash function, but they took care not to use the built-in hash
functions. Instead they did stuff like this:
CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
$$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
CREATE TABLE mchash (a int, b text, c jsonb)
PARTITION BY HASH (a test_int4_ops, b test_text_ops);
I think that this approach should also be used for the new tests.
Variant expected output files are a pain to maintain, and you
basically just have to take whatever output you get as the right
answer, because nobody knows what output a certain built-in hash
function should produce for a given input except by running the code.
If you do the kind of thing shown above, though, then you can easily
see by inspection that you're getting the right answer.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Apr 6, 2018 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
Sounds like you're saying that if we have too many alternative files
then there's a chance that one could pass by luck.Yeah, exactly: it passed, but did it pass for the right reason?
If there's just two expected-files, it's likely not a big problem,
but if you have a bunch it's something to worry about.I'm also wondering how come we had hash partitioning before and
did not have this sort of problem. Is it just that we added a
new test that's more sensitive to the details of the hashing
(if so, could it be made less so)? Or is there actually more
platform dependence now than before (and if so, why is that)?The existing hash partitioning tests did have some dependencies on the
hash function, but they took care not to use the built-in hash
functions. Instead they did stuff like this:CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
$$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
CREATE TABLE mchash (a int, b text, c jsonb)
PARTITION BY HASH (a test_int4_ops, b test_text_ops);I think that this approach should also be used for the new tests.
Variant expected output files are a pain to maintain, and you
basically just have to take whatever output you get as the right
answer, because nobody knows what output a certain built-in hash
function should produce for a given input except by running the code.
If you do the kind of thing shown above, though, then you can easily
see by inspection that you're getting the right answer.
+1.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 2018/04/10 13:27, Ashutosh Bapat wrote:
On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Apr 6, 2018 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
Sounds like you're saying that if we have too many alternative files
then there's a chance that one could pass by luck.Yeah, exactly: it passed, but did it pass for the right reason?
If there's just two expected-files, it's likely not a big problem,
but if you have a bunch it's something to worry about.I'm also wondering how come we had hash partitioning before and
did not have this sort of problem. Is it just that we added a
new test that's more sensitive to the details of the hashing
(if so, could it be made less so)? Or is there actually more
platform dependence now than before (and if so, why is that)?The existing hash partitioning tests did have some dependencies on the
hash function, but they took care not to use the built-in hash
functions. Instead they did stuff like this:CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
$$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
CREATE TABLE mchash (a int, b text, c jsonb)
PARTITION BY HASH (a test_int4_ops, b test_text_ops);I think that this approach should also be used for the new tests.
Variant expected output files are a pain to maintain, and you
basically just have to take whatever output you get as the right
answer, because nobody knows what output a certain built-in hash
function should produce for a given input except by running the code.
If you do the kind of thing shown above, though, then you can easily
see by inspection that you're getting the right answer.
Thanks for the idea. I think it makes sense and also agree that alternate
outputs approach is not perfectly reliable and maintainable.
+1.
Attached find a patch that rewrites hash partition pruning tests that
away. It creates two hash operator classes, one for int4 and another for
text type and uses them to create hash partitioned table to be used in the
tests, like done in the existing tests in hash_part.sql. Since that makes
tests (hopefully) reliably return the same result always, I no longer see
the need to keep them in a separate partition_prune_hash.sql. The
reasoning behind having the separate file was to keep the alternative
output file small as David explained in [1]/messages/by-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg@mail.gmail.com.
However, I noticed that there is a bug in RelationBuildPartitionKey that
causes a crash when using a SQL function as partition support function as
the revised tests do, so please refer to and apply the patches I posted
here before running the revised tests:
/messages/by-id/3041e853-b1dd-a0c6-ff21-7cc5633bffd0@lab.ntt.co.jp
Thanks,
Amit
[1]: /messages/by-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg@mail.gmail.com
/messages/by-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg@mail.gmail.com
Attachments:
v1-0001-Rewrite-hash-partition-pruning-tests-to-use-custo.patchtext/plain; charset=UTF-8; name=v1-0001-Rewrite-hash-partition-pruning-tests-to-use-custo.patchDownload
From c1508fc715a7783108f626c67c76fcc1f2303719 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 10 Apr 2018 16:06:33 +0900
Subject: [PATCH v1] Rewrite hash partition pruning tests to use custom opclass
Relying on platform-provided hashing functions makes tests unreliable
as shown by buildfarm recently.
This adds adjusted tests to partition_prune.sql itself and hence
partition_prune_hash.sql is deleted along with two expected output
files.
Discussion: https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
---
src/test/regress/expected/partition_prune.out | 202 ++++++++++++++++++++-
src/test/regress/expected/partition_prune_hash.out | 189 -------------------
.../regress/expected/partition_prune_hash_1.out | 187 -------------------
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 -
src/test/regress/sql/partition_prune.sql | 59 +++++-
src/test/regress/sql/partition_prune_hash.sql | 41 -----
7 files changed, 259 insertions(+), 422 deletions(-)
delete mode 100644 src/test/regress/expected/partition_prune_hash.out
delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out
delete mode 100644 src/test/regress/sql/partition_prune_hash.sql
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index df3fca025e..935e7dc79b 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1330,7 +1330,207 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
Filter: (a > '100000000000000'::bigint)
(3 rows)
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+--
+-- Test Partition pruning for HASH partitioning
+-- We roll our own operator classes to use for tests, because depending on the
+-- platform-provided hashing functions makes tests unreliable
+--
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;
+CREATE OPERATOR CLASS pp_test_text_ops FOR TYPE text USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashtext_length(text, int8);
+create table hp (a int, b text) partition by hash (a pp_test_int4_ops, b pp_test_text_ops);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (2, 'xxx');
+insert into hp values (1, 'abcde');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+---+-------
+ hp0 | |
+ hp0 | 1 |
+ hp3 | 1 | xxx
+ hp1 | 1 | abcde
+ hp2 | | xxx
+ hp2 | 2 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 2 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 2) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'abcde';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a = 1) AND (b = 'abcde'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp1
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2, hp;
+drop operator class pp_test_text_ops using hash;
+drop operator class pp_test_int4_ops using hash;
+drop function pp_hashint4_noop(int4, int8);
+drop function pp_hashtext_length(text, int8);
--
-- Test runtime partition pruning
--
diff --git a/src/test/regress/expected/partition_prune_hash.out b/src/test/regress/expected/partition_prune_hash.out
deleted file mode 100644
index fbba3f1ff8..0000000000
--- a/src/test/regress/expected/partition_prune_hash.out
+++ /dev/null
@@ -1,189 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
- tableoid | a | b
-----------+----+-----
- hp0 | |
- hp0 | 1 |
- hp0 | 1 | xxx
- hp3 | 10 | yyy
- hp1 | | xxx
- hp2 | 10 | xxx
-(6 rows)
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
- QUERY PLAN
--------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a = 1)
- -> Seq Scan on hp1
- Filter: (a = 1)
- -> Seq Scan on hp2
- Filter: (a = 1)
- -> Seq Scan on hp3
- Filter: (a = 1)
-(9 rows)
-
-explain (costs off) select * from hp where b = 'xxx';
- QUERY PLAN
------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp1
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp2
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp3
- Filter: (b = 'xxx'::text)
-(9 rows)
-
-explain (costs off) select * from hp where a is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a IS NULL)
- -> Seq Scan on hp1
- Filter: (a IS NULL)
- -> Seq Scan on hp2
- Filter: (a IS NULL)
- -> Seq Scan on hp3
- Filter: (a IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where b is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b IS NULL)
- -> Seq Scan on hp1
- Filter: (b IS NULL)
- -> Seq Scan on hp2
- Filter: (b IS NULL)
- -> Seq Scan on hp3
- Filter: (b IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a < 1) AND (b = 'xxx'::text))
-(9 rows)
-
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b = 'yyy'::text))
-(9 rows)
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
- QUERY PLAN
------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a IS NULL) AND (b IS NULL))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b is null;
- QUERY PLAN
--------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((b IS NULL) AND (a = 1))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a is null and b = 'xxx';
- QUERY PLAN
------------------------------------------------------
- Append
- -> Seq Scan on hp1
- Filter: ((a IS NULL) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp2
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'yyy'::text))
-(3 rows)
-
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp2
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp3
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-(7 rows)
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
- QUERY PLAN
----------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
-(9 rows)
-
-drop table hp;
diff --git a/src/test/regress/expected/partition_prune_hash_1.out b/src/test/regress/expected/partition_prune_hash_1.out
deleted file mode 100644
index 4a26a0e277..0000000000
--- a/src/test/regress/expected/partition_prune_hash_1.out
+++ /dev/null
@@ -1,187 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
- tableoid | a | b
-----------+----+-----
- hp0 | |
- hp0 | 1 |
- hp0 | 10 | xxx
- hp3 | | xxx
- hp3 | 10 | yyy
- hp2 | 1 | xxx
-(6 rows)
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
- QUERY PLAN
--------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a = 1)
- -> Seq Scan on hp1
- Filter: (a = 1)
- -> Seq Scan on hp2
- Filter: (a = 1)
- -> Seq Scan on hp3
- Filter: (a = 1)
-(9 rows)
-
-explain (costs off) select * from hp where b = 'xxx';
- QUERY PLAN
------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp1
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp2
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp3
- Filter: (b = 'xxx'::text)
-(9 rows)
-
-explain (costs off) select * from hp where a is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a IS NULL)
- -> Seq Scan on hp1
- Filter: (a IS NULL)
- -> Seq Scan on hp2
- Filter: (a IS NULL)
- -> Seq Scan on hp3
- Filter: (a IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where b is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b IS NULL)
- -> Seq Scan on hp1
- Filter: (b IS NULL)
- -> Seq Scan on hp2
- Filter: (b IS NULL)
- -> Seq Scan on hp3
- Filter: (b IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a < 1) AND (b = 'xxx'::text))
-(9 rows)
-
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b = 'yyy'::text))
-(9 rows)
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
- QUERY PLAN
------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a IS NULL) AND (b IS NULL))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b is null;
- QUERY PLAN
--------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((b IS NULL) AND (a = 1))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp2
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a is null and b = 'xxx';
- QUERY PLAN
------------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a IS NULL) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'yyy'::text))
-(3 rows)
-
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp3
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-(5 rows)
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
- QUERY PLAN
----------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
-(9 rows)
-
-drop table hp;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 0d3a27ed41..839d8a4a4d 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -116,7 +116,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid c
# ----------
# Another group of parallel tests
# ----------
-test: identity partition_join partition_prune partition_prune_hash reloptions hash_part indexing partition_aggregate fast_default
+test: identity partition_join partition_prune reloptions hash_part indexing partition_aggregate fast_default
# event triggers cannot run concurrently with any test that runs DDL
test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 20027c131c..12e10b3ce4 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -185,7 +185,6 @@ test: xml
test: identity
test: partition_join
test: partition_prune
-test: partition_prune_hash
test: reloptions
test: hash_part
test: indexing
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 7fe93bbc04..c02d3e2494 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -236,8 +236,63 @@ create table rparted_by_int2_maxvalue partition of rparted_by_int2 for values fr
-- all partitions but rparted_by_int2_maxvalue pruned
explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
-drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+--
+-- Test Partition pruning for HASH partitioning
+-- We roll our own operator classes to use for tests, because depending on the
+-- platform-provided hashing functions makes tests unreliable
+--
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
+
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;
+
+CREATE OPERATOR CLASS pp_test_text_ops FOR TYPE text USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashtext_length(text, int8);
+
+create table hp (a int, b text) partition by hash (a pp_test_int4_ops, b pp_test_text_ops);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (2, 'xxx');
+insert into hp values (1, 'abcde');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 2 and b = 'xxx';
+explain (costs off) select * from hp where a = 1 and b = 'abcde';
+explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
+
+-- hash partitiong pruning doesn't occur with <> operator clauses
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2, hp;
+drop operator class pp_test_text_ops using hash;
+drop operator class pp_test_int4_ops using hash;
+drop function pp_hashint4_noop(int4, int8);
+drop function pp_hashtext_length(text, int8);
--
-- Test runtime partition pruning
@@ -587,4 +642,4 @@ select * from boolp where a = (select value from boolvalues where not value);
drop table boolp;
-reset enable_indexonlyscan;
\ No newline at end of file
+reset enable_indexonlyscan;
diff --git a/src/test/regress/sql/partition_prune_hash.sql b/src/test/regress/sql/partition_prune_hash.sql
deleted file mode 100644
index fd1783bf53..0000000000
--- a/src/test/regress/sql/partition_prune_hash.sql
+++ /dev/null
@@ -1,41 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
-explain (costs off) select * from hp where b = 'xxx';
-explain (costs off) select * from hp where a is null;
-explain (costs off) select * from hp where b is null;
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
-explain (costs off) select * from hp where a = 1 and b is null;
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
-explain (costs off) select * from hp where a is null and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
-
-drop table hp;
--
2.11.0
On 10 April 2018 at 20:56, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/04/10 13:27, Ashutosh Bapat wrote:
On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
$$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
CREATE TABLE mchash (a int, b text, c jsonb)
PARTITION BY HASH (a test_int4_ops, b test_text_ops);Thanks for the idea. I think it makes sense and also agree that alternate
outputs approach is not perfectly reliable and maintainable.+1.
Attached find a patch that rewrites hash partition pruning tests that
away. It creates two hash operator classes, one for int4 and another for
text type and uses them to create hash partitioned table to be used in the
tests, like done in the existing tests in hash_part.sql. Since that makes
tests (hopefully) reliably return the same result always, I no longer see
the need to keep them in a separate partition_prune_hash.sql. The
reasoning behind having the separate file was to keep the alternative
output file small as David explained in [1].
[1]
/messages/by-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg@mail.gmail.com
I had a quick look, but I'm still confused about why a function like
hash_uint32_extended() is susceptible to varying results depending on
CPU endianness but hash_combine64 is not.
Apart from that confusion, looking at the patch:
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT;
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;
Why coalesce here? Maybe I've not thought of something, but coalesce
only seems useful to me if there's > 1 argument. Plus the function is
strict, so not sure it's really doing even if you added a default.
I know this one was there before, but I only just noticed it:
+-- pruning should work if non-null values are provided for all the keys
+explain (costs off) select * from hp where a is null and b is null;
The comment is a bit misleading given the first test below it is
testing for nulls. Maybe it can be changed to
+-- pruning should work if values or is null clauses are provided for
all partition keys.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Apr 10, 2018 at 5:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
On 10 April 2018 at 20:56, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/04/10 13:27, Ashutosh Bapat wrote:
On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
$$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
CREATE TABLE mchash (a int, b text, c jsonb)
PARTITION BY HASH (a test_int4_ops, b test_text_ops);Thanks for the idea. I think it makes sense and also agree that alternate
outputs approach is not perfectly reliable and maintainable.+1.
Attached find a patch that rewrites hash partition pruning tests that
away. It creates two hash operator classes, one for int4 and another for
text type and uses them to create hash partitioned table to be used in the
tests, like done in the existing tests in hash_part.sql. Since that makes
tests (hopefully) reliably return the same result always, I no longer see
the need to keep them in a separate partition_prune_hash.sql. The
reasoning behind having the separate file was to keep the alternative
output file small as David explained in [1].
[1]
/messages/by-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg@mail.gmail.comI had a quick look, but I'm still confused about why a function like
hash_uint32_extended() is susceptible to varying results depending on
CPU endianness but hash_combine64 is not.Apart from that confusion, looking at the patch:
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT; +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;Why coalesce here? Maybe I've not thought of something, but coalesce
only seems useful to me if there's > 1 argument. Plus the function is
strict, so not sure it's really doing even if you added a default.
I think Amit Langote wanted to write coalesce($1, $2), $2 being the
seed for hash function. See how hash operator class functions are
defined in sql/insert.sql. May be we should just use the same
functions or even the same tables.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Thanks for the comment.
On 2018/04/10 21:11, Ashutosh Bapat wrote:
On Tue, Apr 10, 2018 at 5:32 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:Apart from that confusion, looking at the patch:
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT; +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;Why coalesce here? Maybe I've not thought of something, but coalesce
only seems useful to me if there's > 1 argument. Plus the function is
strict, so not sure it's really doing even if you added a default.I think Amit Langote wanted to write coalesce($1, $2), $2 being the
seed for hash function. See how hash operator class functions are
defined in sql/insert.sql.
Actually, I referenced functions and operator classes defined in
hash_part.sql, not insert.sql. Although as you point out, I didn't think
very hard about the significance of $2 passed to coalesce in those
functions. I will fix that and add it back, along with some other changes
that makes them almost identical with definitions in hash_part.sql.
May be we should just use the same
functions or even the same tables.
Because hash_part.sql and partition_prune.sql tests run in parallel, I've
decided to rename the functions, operator classes, and the tables in
partition_prune.sql. It seems like a good idea in any case. Also, since
the existing pruning tests were written with that table, I decided not to
change that.
Will post an updated patch after addressing David's comment.
Regards,
Amit
Thanks for the review.
On 2018/04/10 21:02, David Rowley wrote:
On 10 April 2018 at 20:56, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/04/10 13:27, Ashutosh Bapat wrote:
On Mon, Apr 9, 2018 at 8:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
$$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
CREATE TABLE mchash (a int, b text, c jsonb)
PARTITION BY HASH (a test_int4_ops, b test_text_ops);Thanks for the idea. I think it makes sense and also agree that alternate
outputs approach is not perfectly reliable and maintainable.+1.
Attached find a patch that rewrites hash partition pruning tests that
away. It creates two hash operator classes, one for int4 and another for
text type and uses them to create hash partitioned table to be used in the
tests, like done in the existing tests in hash_part.sql. Since that makes
tests (hopefully) reliably return the same result always, I no longer see
the need to keep them in a separate partition_prune_hash.sql. The
reasoning behind having the separate file was to keep the alternative
output file small as David explained in [1].
[1]
/messages/by-id/CAKJS1f-SON_hAekqoV4_WQwJBtJ_rvvSe68jRNhuYcXqQ8PoQg@mail.gmail.comI had a quick look, but I'm still confused about why a function like
hash_uint32_extended() is susceptible to varying results depending on
CPU endianness but hash_combine64 is not.
It might as well be the combination of both that's sensitive to
endianness. I too am not sure exactly which part. They're are both used
in succession in compute_hash_value:
/*
* Compute hash for each datum value by calling respective
* datatype-specific hash functions of each partition key
* attribute.
*/
hash = FunctionCall2(&partsupfunc[i], values[i], seed);
/* Form a single 64-bit hash value */
rowHash = hash_combine64(rowHash, DatumGetUInt64(hash));
Apart from that confusion, looking at the patch:
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS +$$SELECT coalesce($1)::int8$$ LANGUAGE sql IMMUTABLE STRICT; +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS +$$SELECT length(coalesce($1))::int8$$ LANGUAGE sql IMMUTABLE STRICT;Why coalesce here? Maybe I've not thought of something, but coalesce
only seems useful to me if there's > 1 argument. Plus the function is
strict, so not sure it's really doing even if you added a default.
After reading Ashutosh's comment, I realized I didn't really mean to add
the STRICT to those function definitions. As these are not operators, but
support (hash) procedures, it's insignificant to the pruning code whether
they are STRICT or not, unlike clause operators where it is.
Also, I've adopted the coalesce-based hashing function from hash_part.sql,
albeit with unnecessary tweaks. I've not read anywhere about why the
coalesce was used in the first place, but it's insignificant for our
purpose here anyway.
I know this one was there before, but I only just noticed it:
+-- pruning should work if non-null values are provided for all the keys +explain (costs off) select * from hp where a is null and b is null;The comment is a bit misleading given the first test below it is
testing for nulls. Maybe it can be changed to+-- pruning should work if values or is null clauses are provided for
all partition keys.
I have adjusted the comments.
Updated patch attached.
Thanks,
Amit
Attachments:
v2-0001-Rewrite-hash-partition-pruning-tests-to-use-custo.patchtext/plain; charset=UTF-8; name=v2-0001-Rewrite-hash-partition-pruning-tests-to-use-custo.patchDownload
From 5a6d00d4d9d6aa8bb84dc9699646ee5c4fa77719 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 10 Apr 2018 16:06:33 +0900
Subject: [PATCH v2] Rewrite hash partition pruning tests to use custom opclass
Relying on platform-provided hashing functions makes tests unreliable
as shown by buildfarm recently.
This adds adjusted tests to partition_prune.sql itself and hence
partition_prune_hash.sql is deleted along with two expected output
files.
Discussion: https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
---
src/test/regress/expected/partition_prune.out | 201 +++++++++++++++++++++
src/test/regress/expected/partition_prune_hash.out | 189 -------------------
.../regress/expected/partition_prune_hash_1.out | 187 -------------------
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 -
src/test/regress/sql/partition_prune.sql | 58 +++++-
src/test/regress/sql/partition_prune_hash.sql | 41 -----
7 files changed, 259 insertions(+), 420 deletions(-)
delete mode 100644 src/test/regress/expected/partition_prune_hash.out
delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out
delete mode 100644 src/test/regress/sql/partition_prune_hash.sql
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index df3fca025e..eb89a5eb67 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1332,6 +1332,207 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
+-- Test Partition pruning for HASH partitioning
+-- We roll our own operator classes to use for tests, because depending on the
+-- platform-provided hashing functions makes tests unreliable
+--
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1, $2)::int8$$ LANGUAGE sql IMMUTABLE;
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE;
+CREATE OPERATOR CLASS pp_test_text_ops FOR TYPE text USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashtext_length(text, int8);
+create table hp (a int, b text) partition by hash (a pp_test_int4_ops, b pp_test_text_ops);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (2, 'xxx');
+insert into hp values (1, 'abcde');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+---+-------
+ hp0 | |
+ hp0 | 1 |
+ hp3 | 1 | xxx
+ hp1 | 1 | abcde
+ hp2 | | xxx
+ hp2 | 2 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+-- pruning should work if either a value or a IS NULL clause is provided for
+-- each of the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 2 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 2) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'abcde';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a = 1) AND (b = 'abcde'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp1
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+drop table hp;
+drop operator class pp_test_text_ops using hash;
+drop operator class pp_test_int4_ops using hash;
+drop function pp_hashint4_noop(int4, int8);
+drop function pp_hashtext_length(text, int8);
+--
-- Test runtime partition pruning
--
create table ab (a int not null, b int not null) partition by list (a);
diff --git a/src/test/regress/expected/partition_prune_hash.out b/src/test/regress/expected/partition_prune_hash.out
deleted file mode 100644
index fbba3f1ff8..0000000000
--- a/src/test/regress/expected/partition_prune_hash.out
+++ /dev/null
@@ -1,189 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
- tableoid | a | b
-----------+----+-----
- hp0 | |
- hp0 | 1 |
- hp0 | 1 | xxx
- hp3 | 10 | yyy
- hp1 | | xxx
- hp2 | 10 | xxx
-(6 rows)
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
- QUERY PLAN
--------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a = 1)
- -> Seq Scan on hp1
- Filter: (a = 1)
- -> Seq Scan on hp2
- Filter: (a = 1)
- -> Seq Scan on hp3
- Filter: (a = 1)
-(9 rows)
-
-explain (costs off) select * from hp where b = 'xxx';
- QUERY PLAN
------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp1
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp2
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp3
- Filter: (b = 'xxx'::text)
-(9 rows)
-
-explain (costs off) select * from hp where a is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a IS NULL)
- -> Seq Scan on hp1
- Filter: (a IS NULL)
- -> Seq Scan on hp2
- Filter: (a IS NULL)
- -> Seq Scan on hp3
- Filter: (a IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where b is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b IS NULL)
- -> Seq Scan on hp1
- Filter: (b IS NULL)
- -> Seq Scan on hp2
- Filter: (b IS NULL)
- -> Seq Scan on hp3
- Filter: (b IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a < 1) AND (b = 'xxx'::text))
-(9 rows)
-
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b = 'yyy'::text))
-(9 rows)
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
- QUERY PLAN
------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a IS NULL) AND (b IS NULL))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b is null;
- QUERY PLAN
--------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((b IS NULL) AND (a = 1))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a is null and b = 'xxx';
- QUERY PLAN
------------------------------------------------------
- Append
- -> Seq Scan on hp1
- Filter: ((a IS NULL) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp2
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'yyy'::text))
-(3 rows)
-
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp2
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp3
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-(7 rows)
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
- QUERY PLAN
----------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
-(9 rows)
-
-drop table hp;
diff --git a/src/test/regress/expected/partition_prune_hash_1.out b/src/test/regress/expected/partition_prune_hash_1.out
deleted file mode 100644
index 4a26a0e277..0000000000
--- a/src/test/regress/expected/partition_prune_hash_1.out
+++ /dev/null
@@ -1,187 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
- tableoid | a | b
-----------+----+-----
- hp0 | |
- hp0 | 1 |
- hp0 | 10 | xxx
- hp3 | | xxx
- hp3 | 10 | yyy
- hp2 | 1 | xxx
-(6 rows)
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
- QUERY PLAN
--------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a = 1)
- -> Seq Scan on hp1
- Filter: (a = 1)
- -> Seq Scan on hp2
- Filter: (a = 1)
- -> Seq Scan on hp3
- Filter: (a = 1)
-(9 rows)
-
-explain (costs off) select * from hp where b = 'xxx';
- QUERY PLAN
------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp1
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp2
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp3
- Filter: (b = 'xxx'::text)
-(9 rows)
-
-explain (costs off) select * from hp where a is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a IS NULL)
- -> Seq Scan on hp1
- Filter: (a IS NULL)
- -> Seq Scan on hp2
- Filter: (a IS NULL)
- -> Seq Scan on hp3
- Filter: (a IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where b is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b IS NULL)
- -> Seq Scan on hp1
- Filter: (b IS NULL)
- -> Seq Scan on hp2
- Filter: (b IS NULL)
- -> Seq Scan on hp3
- Filter: (b IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a < 1) AND (b = 'xxx'::text))
-(9 rows)
-
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b = 'yyy'::text))
-(9 rows)
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
- QUERY PLAN
------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a IS NULL) AND (b IS NULL))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b is null;
- QUERY PLAN
--------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((b IS NULL) AND (a = 1))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp2
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a is null and b = 'xxx';
- QUERY PLAN
------------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a IS NULL) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'yyy'::text))
-(3 rows)
-
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp3
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-(5 rows)
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
- QUERY PLAN
----------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
-(9 rows)
-
-drop table hp;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 0d3a27ed41..839d8a4a4d 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -116,7 +116,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid c
# ----------
# Another group of parallel tests
# ----------
-test: identity partition_join partition_prune partition_prune_hash reloptions hash_part indexing partition_aggregate fast_default
+test: identity partition_join partition_prune reloptions hash_part indexing partition_aggregate fast_default
# event triggers cannot run concurrently with any test that runs DDL
test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 20027c131c..12e10b3ce4 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -185,7 +185,6 @@ test: xml
test: identity
test: partition_join
test: partition_prune
-test: partition_prune_hash
test: reloptions
test: hash_part
test: indexing
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 7fe93bbc04..6cc8e3cdfc 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -238,6 +238,62 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+--
+-- Test Partition pruning for HASH partitioning
+-- We roll our own operator classes to use for tests, because depending on the
+-- platform-provided hashing functions makes tests unreliable
+--
+
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1, $2)::int8$$ LANGUAGE sql IMMUTABLE;
+
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE;
+
+CREATE OPERATOR CLASS pp_test_text_ops FOR TYPE text USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashtext_length(text, int8);
+
+create table hp (a int, b text) partition by hash (a pp_test_int4_ops, b pp_test_text_ops);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (2, 'xxx');
+insert into hp values (1, 'abcde');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+-- pruning should work if either a value or a IS NULL clause is provided for
+-- each of the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 2 and b = 'xxx';
+explain (costs off) select * from hp where a = 1 and b = 'abcde';
+explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
+
+drop table hp;
+drop operator class pp_test_text_ops using hash;
+drop operator class pp_test_int4_ops using hash;
+drop function pp_hashint4_noop(int4, int8);
+drop function pp_hashtext_length(text, int8);
--
-- Test runtime partition pruning
@@ -587,4 +643,4 @@ select * from boolp where a = (select value from boolvalues where not value);
drop table boolp;
-reset enable_indexonlyscan;
\ No newline at end of file
+reset enable_indexonlyscan;
diff --git a/src/test/regress/sql/partition_prune_hash.sql b/src/test/regress/sql/partition_prune_hash.sql
deleted file mode 100644
index fd1783bf53..0000000000
--- a/src/test/regress/sql/partition_prune_hash.sql
+++ /dev/null
@@ -1,41 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
-explain (costs off) select * from hp where b = 'xxx';
-explain (costs off) select * from hp where a is null;
-explain (costs off) select * from hp where b is null;
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
-explain (costs off) select * from hp where a = 1 and b is null;
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
-explain (costs off) select * from hp where a is null and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
-
-drop table hp;
--
2.11.0
On 11 April 2018 at 18:04, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Updated patch attached.
Thanks for the updated patch.
The only thing I'm not sure about is the chances you've made to the
COALESCE function.
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1, $2)::int8$$ LANGUAGE sql IMMUTABLE;
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE;
Why does one default to the seed and the other to an empty string?
Shouldn't they both do the same thing? If you were to copy the
hash_part.sql you'd just coalesce($1, 0) and coalesce($1, ''), any
special reason not to do that?
Also just wondering if it's worth adding some verification that we've
actually eliminated the correct partitions by backing the tests up
with a call to satisfies_hash_partition.
I've attached a delta patch that applies to your v2 which does this.
Do you think it's worth doing?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
add_satisfies_hash_partition_verification.patchapplication/octet-stream; name=add_satisfies_hash_partition_verification.patchDownload
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index e81c4691ec..2a74601fc1 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1034,7 +1034,7 @@ RelationBuildPartitionKey(Relation relation)
procnum,
format_type_be(opclassform->opcintype))));
- fmgr_info(funcid, &key->partsupfunc[i]);
+ fmgr_info_cxt(funcid, &key->partsupfunc[i], partkeycxt);
/* Collation */
key->partcollation[i] = collation->values[i];
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index eb89a5eb67..ba493f04ab 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1475,6 +1475,12 @@ explain (costs off) select * from hp where a is null and b is null;
Filter: ((a IS NULL) AND (b IS NULL))
(3 rows)
+select satisfies_hash_partition('hp'::regclass, 4, 0, null::int, null::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
explain (costs off) select * from hp where a = 1 and b is null;
QUERY PLAN
-------------------------------------------
@@ -1483,6 +1489,12 @@ explain (costs off) select * from hp where a = 1 and b is null;
Filter: ((b IS NULL) AND (a = 1))
(3 rows)
+select satisfies_hash_partition('hp'::regclass, 4, 0, 1, null::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
explain (costs off) select * from hp where a = 1 and b = 'xxx';
QUERY PLAN
-------------------------------------------------
@@ -1491,6 +1503,12 @@ explain (costs off) select * from hp where a = 1 and b = 'xxx';
Filter: ((a = 1) AND (b = 'xxx'::text))
(3 rows)
+select satisfies_hash_partition('hp'::regclass, 4, 3, 1, 'xxx'::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
explain (costs off) select * from hp where a is null and b = 'xxx';
QUERY PLAN
-----------------------------------------------------
@@ -1499,6 +1517,12 @@ explain (costs off) select * from hp where a is null and b = 'xxx';
Filter: ((a IS NULL) AND (b = 'xxx'::text))
(3 rows)
+select satisfies_hash_partition('hp'::regclass, 4, 2, null::int, 'xxx'::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
explain (costs off) select * from hp where a = 2 and b = 'xxx';
QUERY PLAN
-------------------------------------------------
@@ -1507,6 +1531,12 @@ explain (costs off) select * from hp where a = 2 and b = 'xxx';
Filter: ((a = 2) AND (b = 'xxx'::text))
(3 rows)
+select satisfies_hash_partition('hp'::regclass, 4, 2, 2, 'xxx'::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
explain (costs off) select * from hp where a = 1 and b = 'abcde';
QUERY PLAN
---------------------------------------------------
@@ -1515,6 +1545,12 @@ explain (costs off) select * from hp where a = 1 and b = 'abcde';
Filter: ((a = 1) AND (b = 'abcde'::text))
(3 rows)
+select satisfies_hash_partition('hp'::regclass, 4, 1, 1, 'abcde'::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6cc8e3cdfc..787468f290 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -282,11 +282,23 @@ explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
-- pruning should work if either a value or a IS NULL clause is provided for
-- each of the keys
explain (costs off) select * from hp where a is null and b is null;
+select satisfies_hash_partition('hp'::regclass, 4, 0, null::int, null::text);
+
explain (costs off) select * from hp where a = 1 and b is null;
+select satisfies_hash_partition('hp'::regclass, 4, 0, 1, null::text);
+
explain (costs off) select * from hp where a = 1 and b = 'xxx';
+select satisfies_hash_partition('hp'::regclass, 4, 3, 1, 'xxx'::text);
+
explain (costs off) select * from hp where a is null and b = 'xxx';
+select satisfies_hash_partition('hp'::regclass, 4, 2, null::int, 'xxx'::text);
+
explain (costs off) select * from hp where a = 2 and b = 'xxx';
+select satisfies_hash_partition('hp'::regclass, 4, 2, 2, 'xxx'::text);
+
explain (costs off) select * from hp where a = 1 and b = 'abcde';
+select satisfies_hash_partition('hp'::regclass, 4, 1, 1, 'abcde'::text);
+
explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
drop table hp;
Hi David.
Thanks for the review.
On 2018/04/11 17:59, David Rowley wrote:
On 11 April 2018 at 18:04, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Updated patch attached.
Thanks for the updated patch.
The only thing I'm not sure about is the chances you've made to the
COALESCE function.+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS +$$SELECT coalesce($1, $2)::int8$$ LANGUAGE sql IMMUTABLE; +CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS +OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8); +CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS +$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE;Why does one default to the seed and the other to an empty string?
Shouldn't they both do the same thing? If you were to copy the
hash_part.sql you'd just coalesce($1, 0) and coalesce($1, ''), any
special reason not to do that?
Oops, so I hadn't actually restored it to the way it is in hash_part.sql.
Also, Ashutosh was talking about the custom hashing function used in
insert.sql, not hash_part.sql, which I based my revision upon.
Fixed it now.
Also just wondering if it's worth adding some verification that we've
actually eliminated the correct partitions by backing the tests up
with a call to satisfies_hash_partition.I've attached a delta patch that applies to your v2 which does this.
Do you think it's worth doing?
We can see check by inspection that individual values are in appropriate
partitions, which is the point of having the inserts and the select just
above the actual pruning related tests. So, I'm not sure if adding the
satisfies_hash_partition against each pruning tests adds much.
Attached revised patch.
Thanks,
Amit
Attachments:
v3-0001-Rewrite-hash-partition-pruning-tests-to-use-custo.patchtext/plain; charset=UTF-8; name=v3-0001-Rewrite-hash-partition-pruning-tests-to-use-custo.patchDownload
From 4685448a7eb2eaf5feceea2206d136c135b2dea7 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 10 Apr 2018 16:06:33 +0900
Subject: [PATCH v3] Rewrite hash partition pruning tests to use custom opclass
Relying on platform-provided hashing functions makes tests unreliable
as shown by buildfarm recently.
This adds adjusted tests to partition_prune.sql itself and hence
partition_prune_hash.sql is deleted along with two expected output
files.
Discussion: https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
---
src/test/regress/expected/partition_prune.out | 237 +++++++++++++++++++++
src/test/regress/expected/partition_prune_hash.out | 189 ----------------
.../regress/expected/partition_prune_hash_1.out | 187 ----------------
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 -
src/test/regress/sql/partition_prune.sql | 70 +++++-
src/test/regress/sql/partition_prune_hash.sql | 41 ----
7 files changed, 307 insertions(+), 420 deletions(-)
delete mode 100644 src/test/regress/expected/partition_prune_hash.out
delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out
delete mode 100644 src/test/regress/sql/partition_prune_hash.sql
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index df3fca025e..d13389b9c2 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1332,6 +1332,243 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
+-- Test Partition pruning for HASH partitioning
+-- We roll our own operator classes to use for tests, because depending on the
+-- platform-provided hashing functions makes tests unreliable
+--
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1, 0)::int8$$ LANGUAGE sql IMMUTABLE;
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE;
+CREATE OPERATOR CLASS pp_test_text_ops FOR TYPE text USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashtext_length(text, int8);
+create table hp (a int, b text) partition by hash (a pp_test_int4_ops, b pp_test_text_ops);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (2, 'xxx');
+insert into hp values (1, 'abcde');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+---+-------
+ hp0 | |
+ hp0 | 1 |
+ hp3 | 1 | xxx
+ hp1 | 1 | abcde
+ hp2 | | xxx
+ hp2 | 2 | xxx
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+-- pruning should work if either a value or a IS NULL clause is provided for
+-- each of the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+select satisfies_hash_partition('hp'::regclass, 4, 0, null::int, null::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+select satisfies_hash_partition('hp'::regclass, 4, 0, 1, null::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+select satisfies_hash_partition('hp'::regclass, 4, 3, 1, 'xxx'::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+select satisfies_hash_partition('hp'::regclass, 4, 2, null::int, 'xxx'::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
+explain (costs off) select * from hp where a = 2 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 2) AND (b = 'xxx'::text))
+(3 rows)
+
+select satisfies_hash_partition('hp'::regclass, 4, 2, 2, 'xxx'::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
+explain (costs off) select * from hp where a = 1 and b = 'abcde';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((a = 1) AND (b = 'abcde'::text))
+(3 rows)
+
+select satisfies_hash_partition('hp'::regclass, 4, 1, 1, 'abcde'::text);
+ satisfies_hash_partition
+--------------------------
+ t
+(1 row)
+
+explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp1
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+drop table hp;
+drop operator class pp_test_text_ops using hash;
+drop operator class pp_test_int4_ops using hash;
+drop function pp_hashint4_noop(int4, int8);
+drop function pp_hashtext_length(text, int8);
+--
-- Test runtime partition pruning
--
create table ab (a int not null, b int not null) partition by list (a);
diff --git a/src/test/regress/expected/partition_prune_hash.out b/src/test/regress/expected/partition_prune_hash.out
deleted file mode 100644
index fbba3f1ff8..0000000000
--- a/src/test/regress/expected/partition_prune_hash.out
+++ /dev/null
@@ -1,189 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
- tableoid | a | b
-----------+----+-----
- hp0 | |
- hp0 | 1 |
- hp0 | 1 | xxx
- hp3 | 10 | yyy
- hp1 | | xxx
- hp2 | 10 | xxx
-(6 rows)
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
- QUERY PLAN
--------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a = 1)
- -> Seq Scan on hp1
- Filter: (a = 1)
- -> Seq Scan on hp2
- Filter: (a = 1)
- -> Seq Scan on hp3
- Filter: (a = 1)
-(9 rows)
-
-explain (costs off) select * from hp where b = 'xxx';
- QUERY PLAN
------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp1
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp2
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp3
- Filter: (b = 'xxx'::text)
-(9 rows)
-
-explain (costs off) select * from hp where a is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a IS NULL)
- -> Seq Scan on hp1
- Filter: (a IS NULL)
- -> Seq Scan on hp2
- Filter: (a IS NULL)
- -> Seq Scan on hp3
- Filter: (a IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where b is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b IS NULL)
- -> Seq Scan on hp1
- Filter: (b IS NULL)
- -> Seq Scan on hp2
- Filter: (b IS NULL)
- -> Seq Scan on hp3
- Filter: (b IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a < 1) AND (b = 'xxx'::text))
-(9 rows)
-
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b = 'yyy'::text))
-(9 rows)
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
- QUERY PLAN
------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a IS NULL) AND (b IS NULL))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b is null;
- QUERY PLAN
--------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((b IS NULL) AND (a = 1))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a is null and b = 'xxx';
- QUERY PLAN
------------------------------------------------------
- Append
- -> Seq Scan on hp1
- Filter: ((a IS NULL) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp2
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'yyy'::text))
-(3 rows)
-
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp2
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp3
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-(7 rows)
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
- QUERY PLAN
----------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
-(9 rows)
-
-drop table hp;
diff --git a/src/test/regress/expected/partition_prune_hash_1.out b/src/test/regress/expected/partition_prune_hash_1.out
deleted file mode 100644
index 4a26a0e277..0000000000
--- a/src/test/regress/expected/partition_prune_hash_1.out
+++ /dev/null
@@ -1,187 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
- tableoid | a | b
-----------+----+-----
- hp0 | |
- hp0 | 1 |
- hp0 | 10 | xxx
- hp3 | | xxx
- hp3 | 10 | yyy
- hp2 | 1 | xxx
-(6 rows)
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
- QUERY PLAN
--------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a = 1)
- -> Seq Scan on hp1
- Filter: (a = 1)
- -> Seq Scan on hp2
- Filter: (a = 1)
- -> Seq Scan on hp3
- Filter: (a = 1)
-(9 rows)
-
-explain (costs off) select * from hp where b = 'xxx';
- QUERY PLAN
------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp1
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp2
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp3
- Filter: (b = 'xxx'::text)
-(9 rows)
-
-explain (costs off) select * from hp where a is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a IS NULL)
- -> Seq Scan on hp1
- Filter: (a IS NULL)
- -> Seq Scan on hp2
- Filter: (a IS NULL)
- -> Seq Scan on hp3
- Filter: (a IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where b is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b IS NULL)
- -> Seq Scan on hp1
- Filter: (b IS NULL)
- -> Seq Scan on hp2
- Filter: (b IS NULL)
- -> Seq Scan on hp3
- Filter: (b IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a < 1) AND (b = 'xxx'::text))
-(9 rows)
-
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b = 'yyy'::text))
-(9 rows)
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
- QUERY PLAN
------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a IS NULL) AND (b IS NULL))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b is null;
- QUERY PLAN
--------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((b IS NULL) AND (a = 1))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp2
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a is null and b = 'xxx';
- QUERY PLAN
------------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a IS NULL) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'yyy'::text))
-(3 rows)
-
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp3
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-(5 rows)
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
- QUERY PLAN
----------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
-(9 rows)
-
-drop table hp;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 0d3a27ed41..839d8a4a4d 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -116,7 +116,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid c
# ----------
# Another group of parallel tests
# ----------
-test: identity partition_join partition_prune partition_prune_hash reloptions hash_part indexing partition_aggregate fast_default
+test: identity partition_join partition_prune reloptions hash_part indexing partition_aggregate fast_default
# event triggers cannot run concurrently with any test that runs DDL
test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 20027c131c..12e10b3ce4 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -185,7 +185,6 @@ test: xml
test: identity
test: partition_join
test: partition_prune
-test: partition_prune_hash
test: reloptions
test: hash_part
test: indexing
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 7fe93bbc04..f18c4c7f9e 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -238,6 +238,74 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+--
+-- Test Partition pruning for HASH partitioning
+-- We roll our own operator classes to use for tests, because depending on the
+-- platform-provided hashing functions makes tests unreliable
+--
+
+CREATE OR REPLACE FUNCTION pp_hashint4_noop(int4, int8) RETURNS int8 AS
+$$SELECT coalesce($1, 0)::int8$$ LANGUAGE sql IMMUTABLE;
+
+CREATE OPERATOR CLASS pp_test_int4_ops FOR TYPE int4 USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashint4_noop(int4, int8);
+
+CREATE OR REPLACE FUNCTION pp_hashtext_length(text, int8) RETURNS int8 AS
+$$SELECT length(coalesce($1, ''))::int8$$ LANGUAGE sql IMMUTABLE;
+
+CREATE OPERATOR CLASS pp_test_text_ops FOR TYPE text USING HASH AS
+OPERATOR 1 = , FUNCTION 2 pp_hashtext_length(text, int8);
+
+create table hp (a int, b text) partition by hash (a pp_test_int4_ops, b pp_test_text_ops);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (2, 'xxx');
+insert into hp values (1, 'abcde');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+-- pruning should work if either a value or a IS NULL clause is provided for
+-- each of the keys
+explain (costs off) select * from hp where a is null and b is null;
+select satisfies_hash_partition('hp'::regclass, 4, 0, null::int, null::text);
+
+explain (costs off) select * from hp where a = 1 and b is null;
+select satisfies_hash_partition('hp'::regclass, 4, 0, 1, null::text);
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+select satisfies_hash_partition('hp'::regclass, 4, 3, 1, 'xxx'::text);
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+select satisfies_hash_partition('hp'::regclass, 4, 2, null::int, 'xxx'::text);
+
+explain (costs off) select * from hp where a = 2 and b = 'xxx';
+select satisfies_hash_partition('hp'::regclass, 4, 2, 2, 'xxx'::text);
+
+explain (costs off) select * from hp where a = 1 and b = 'abcde';
+select satisfies_hash_partition('hp'::regclass, 4, 1, 1, 'abcde'::text);
+
+explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
+
+drop table hp;
+drop operator class pp_test_text_ops using hash;
+drop operator class pp_test_int4_ops using hash;
+drop function pp_hashint4_noop(int4, int8);
+drop function pp_hashtext_length(text, int8);
--
-- Test runtime partition pruning
@@ -587,4 +655,4 @@ select * from boolp where a = (select value from boolvalues where not value);
drop table boolp;
-reset enable_indexonlyscan;
\ No newline at end of file
+reset enable_indexonlyscan;
diff --git a/src/test/regress/sql/partition_prune_hash.sql b/src/test/regress/sql/partition_prune_hash.sql
deleted file mode 100644
index fd1783bf53..0000000000
--- a/src/test/regress/sql/partition_prune_hash.sql
+++ /dev/null
@@ -1,41 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
-explain (costs off) select * from hp where b = 'xxx';
-explain (costs off) select * from hp where a is null;
-explain (costs off) select * from hp where b is null;
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
-explain (costs off) select * from hp where a = 1 and b is null;
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
-explain (costs off) select * from hp where a is null and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
-
-drop table hp;
--
2.11.0
On Wed, Apr 11, 2018 at 2:52 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
I've attached a delta patch that applies to your v2 which does this.
Do you think it's worth doing?We can see check by inspection that individual values are in appropriate
partitions, which is the point of having the inserts and the select just
above the actual pruning related tests. So, I'm not sure if adding the
satisfies_hash_partition against each pruning tests adds much.
+1.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On 11 April 2018 at 21:22, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Also just wondering if it's worth adding some verification that we've
actually eliminated the correct partitions by backing the tests up
with a call to satisfies_hash_partition.I've attached a delta patch that applies to your v2 which does this.
Do you think it's worth doing?We can see check by inspection that individual values are in appropriate
partitions, which is the point of having the inserts and the select just
above the actual pruning related tests. So, I'm not sure if adding the
satisfies_hash_partition against each pruning tests adds much.
Right, that's true.
Attached revised patch.
Thanks. It looks fine to me, with or without the
satisfies_hash_partition tests. I agree that they're probably
overkill, but I see you've added them now.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Here's an idea. Why don't we move the function/opclass creation lines
to insert.sql, without the DROPs, and use the same functions/opclasses
in the three tests insert.sql, alter_table.sql, hash_part.sql and
partition_prune.sql, i.e. not recreate what are essentially the same
objects three times? This also leaves them around for the pg_upgrade
test, which is not a bad thing.
(This would require a few updates to insert.sql because the definitions
there are different, but it shouldn't be a problem coverage-wise.)
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018/04/11 21:35, Alvaro Herrera wrote:
Here's an idea. Why don't we move the function/opclass creation lines
to insert.sql, without the DROPs, and use the same functions/opclasses
in the three tests insert.sql, alter_table.sql, hash_part.sql and
partition_prune.sql, i.e. not recreate what are essentially the same
objects three times? This also leaves them around for the pg_upgrade
test, which is not a bad thing.(This would require a few updates to insert.sql because the definitions
there are different, but it shouldn't be a problem coverage-wise.)
OK, I've tried doing that. Needed adjustments to hash_part.sql as well.
The hash function for int4 was defined differently in insert.sql,
alter_table.sql, and hash_part.sql. I went with the definition in
insert.sql, which although slightly different from the one
alter_table.sql, didn't affect the latter's output in any way. Since the
definition in hash_part.sql was totally different, a couple of tests
needed adjusting after starting to use hash opclasses defined in insert.sql.
Attached updated patch.
PS: git grep "partition by hash\|PARTITION BY HASH" on src/test indicates
that there are hash partitioning related tests in create_table,
foreign_key, and partition_join files as well. Do we want to use the
custom opclass in those files as well?
Thanks,
Amit
Attachments:
v4-0001-Rewrite-hash-partition-pruning-tests-to-use-custo.patchtext/plain; charset=UTF-8; name=v4-0001-Rewrite-hash-partition-pruning-tests-to-use-custo.patchDownload
From 5a01d81aa7e90ef130b245c5e38b02fe9be5e8d7 Mon Sep 17 00:00:00 2001
From: amit <amitlangote09@gmail.com>
Date: Tue, 10 Apr 2018 16:06:33 +0900
Subject: [PATCH v4] Rewrite hash partition pruning tests to use custom opclass
Relying on platform-provided hashing functions makes tests unreliable
as shown by buildfarm recently.
This adds adjusted tests to partition_prune.sql itself and hence
partition_prune_hash.sql is deleted along with two expected output
files.
Discussion: https://postgr.es/m/CA%2BTgmoZ0D5kJbt8eKXtvVdvTcGGWn6ehWCRSZbWytD-uzH92mQ%40mail.gmail.com
---
src/test/regress/expected/alter_table.out | 15 +-
src/test/regress/expected/hash_part.out | 23 +--
src/test/regress/expected/insert.out | 32 +++-
src/test/regress/expected/partition_prune.out | 191 +++++++++++++++++++++
src/test/regress/expected/partition_prune_hash.out | 189 --------------------
.../regress/expected/partition_prune_hash_1.out | 187 --------------------
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 -
src/test/regress/sql/alter_table.sql | 15 +-
src/test/regress/sql/hash_part.sql | 24 +--
src/test/regress/sql/insert.sql | 36 +++-
src/test/regress/sql/partition_prune.sql | 44 ++++-
src/test/regress/sql/partition_prune_hash.sql | 41 -----
13 files changed, 305 insertions(+), 495 deletions(-)
delete mode 100644 src/test/regress/expected/partition_prune_hash.out
delete mode 100644 src/test/regress/expected/partition_prune_hash_1.out
delete mode 100644 src/test/regress/sql/partition_prune_hash.sql
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 63845910a6..50b9443e2d 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3662,20 +3662,13 @@ CREATE TABLE quuux2 PARTITION OF quuux FOR VALUES IN (2);
INFO: updated partition constraint for default partition "quuux_default1" is implied by existing constraints
DROP TABLE quuux;
-- check validation when attaching hash partitions
--- The default hash functions as they exist today aren't portable; they can
--- return different results on different machines. Depending upon how the
--- values are hashed, the row may map to different partitions, which result in
--- regression failure. To avoid this, let's create a non-default hash function
--- that just returns the input value unchanged.
-CREATE OR REPLACE FUNCTION dummy_hashint4(a int4, seed int8) RETURNS int8 AS
-$$ BEGIN RETURN (a + 1 + seed); END; $$ LANGUAGE 'plpgsql' IMMUTABLE;
-CREATE OPERATOR CLASS custom_opclass FOR TYPE int4 USING HASH AS
-OPERATOR 1 = , FUNCTION 2 dummy_hashint4(int4, int8);
+-- Use hand-rolled hash functions and operator class to get predictable result
+-- on different matchines. part_test_int4_ops is defined in insert.sql.
-- check that the new partition won't overlap with an existing partition
CREATE TABLE hash_parted (
a int,
b int
-) PARTITION BY HASH (a custom_opclass);
+) PARTITION BY HASH (a part_test_int4_ops);
CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE fail_part (LIKE hpart_1);
ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (MODULUS 8, REMAINDER 4);
@@ -3840,8 +3833,6 @@ SELECT * FROM list_parted;
DROP TABLE list_parted, list_parted2, range_parted;
DROP TABLE fail_def_part;
DROP TABLE hash_parted;
-DROP OPERATOR CLASS custom_opclass USING HASH;
-DROP FUNCTION dummy_hashint4(a int4, seed int8);
-- more tests for certain multi-level partitioning scenarios
create table p (a int, b int) partition by range (a, b);
create table p1 (b int, a int not null) partition by range (b);
diff --git a/src/test/regress/expected/hash_part.out b/src/test/regress/expected/hash_part.out
index 9e9e56f6fc..731d26fc3d 100644
--- a/src/test/regress/expected/hash_part.out
+++ b/src/test/regress/expected/hash_part.out
@@ -1,16 +1,11 @@
--
-- Hash partitioning.
--
-CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
-$$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
-CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
-OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
-CREATE OR REPLACE FUNCTION hashtext_length(text, int8) RETURNS int8 AS
-$$SELECT length(coalesce($1,''))::int8$$ LANGUAGE sql IMMUTABLE;
-CREATE OPERATOR CLASS test_text_ops FOR TYPE text USING HASH AS
-OPERATOR 1 = , FUNCTION 2 hashtext_length(text, int8);
+-- Use hand-rolled hash functions and operator classes to get predictable
+-- result on different matchines. See the definitions of
+-- part_part_test_int4_ops and part_test_text_ops in insert.sql.
CREATE TABLE mchash (a int, b text, c jsonb)
- PARTITION BY HASH (a test_int4_ops, b test_text_ops);
+ PARTITION BY HASH (a part_test_int4_ops, b part_test_text_ops);
CREATE TABLE mchash1
PARTITION OF mchash FOR VALUES WITH (MODULUS 4, REMAINDER 0);
-- invalid OID, no such table
@@ -66,7 +61,7 @@ SELECT satisfies_hash_partition('mchash'::regclass, 4, 0, 0, ''::text);
(1 row)
-- ok, should be true
-SELECT satisfies_hash_partition('mchash'::regclass, 4, 0, 1, ''::text);
+SELECT satisfies_hash_partition('mchash'::regclass, 4, 0, 2, ''::text);
satisfies_hash_partition
--------------------------
t
@@ -79,7 +74,7 @@ SELECT satisfies_hash_partition('mchash'::regclass, 2, 1,
ERROR: column 2 of the partition key has type "text", but supplied value is of type "integer"
-- multiple partitioning columns of the same type
CREATE TABLE mcinthash (a int, b int, c jsonb)
- PARTITION BY HASH (a test_int4_ops, b test_int4_ops);
+ PARTITION BY HASH (a part_test_int4_ops, b part_test_int4_ops);
-- now variadic should work, should be false
SELECT satisfies_hash_partition('mcinthash'::regclass, 4, 0,
variadic array[0, 0]);
@@ -90,7 +85,7 @@ SELECT satisfies_hash_partition('mcinthash'::regclass, 4, 0,
-- should be true
SELECT satisfies_hash_partition('mcinthash'::regclass, 4, 0,
- variadic array[1, 0]);
+ variadic array[0, 1]);
satisfies_hash_partition
--------------------------
t
@@ -107,7 +102,3 @@ ERROR: column 1 of the partition key has type "integer", but supplied value is
-- cleanup
DROP TABLE mchash;
DROP TABLE mcinthash;
-DROP OPERATOR CLASS test_text_ops USING hash;
-DROP OPERATOR CLASS test_int4_ops USING hash;
-DROP FUNCTION hashint4_noop(int4, int8);
-DROP FUNCTION hashtext_length(text, int8);
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 97419a744f..5edf269367 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -387,15 +387,31 @@ select tableoid::regclass::text, a, min(b) as min_b, max(b) as max_b from list_p
(9 rows)
-- direct partition inserts should check hash partition bound constraint
--- create custom operator class and hash function, for the same reason
--- explained in alter_table.sql
-create or replace function dummy_hashint4(a int4, seed int8) returns int8 as
-$$ begin return (a + seed); end; $$ language 'plpgsql' immutable;
-create operator class custom_opclass for type int4 using hash as
-operator 1 = , function 2 dummy_hashint4(int4, int8);
+-- Use hand-rolled hash functions and operator classes to get predictable
+-- result on different matchines. The hash function for int4 simply returns
+-- the sum of the values passed to it and the one for text returns the length
+-- of the non-empty string value passed to it or 0.
+create or replace function part_hashint4_noop(value int4, seed int8)
+returns int8 as $$
+select value + seed;
+$$ language sql immutable;
+create operator class part_test_int4_ops
+for type int4
+using hash as
+operator 1 =,
+function 2 part_hashint4_noop(int4, int8);
+create or replace function part_hashtext_length(value text, seed int8)
+RETURNS int8 AS $$
+select length(coalesce(value, ''))::int8
+$$ language sql immutable;
+create operator class part_test_text_ops
+for type text
+using hash as
+operator 1 =,
+function 2 part_hashtext_length(text, int8);
create table hash_parted (
a int
-) partition by hash (a custom_opclass);
+) partition by hash (a part_test_int4_ops);
create table hpart0 partition of hash_parted for values with (modulus 4, remainder 0);
create table hpart1 partition of hash_parted for values with (modulus 4, remainder 1);
create table hpart2 partition of hash_parted for values with (modulus 4, remainder 2);
@@ -449,8 +465,6 @@ Partitions: part_aa_bb FOR VALUES IN ('aa', 'bb'),
-- cleanup
drop table range_parted, list_parted;
drop table hash_parted;
-drop operator class custom_opclass using hash;
-drop function dummy_hashint4(a int4, seed int8);
-- test that a default partition added as the first partition accepts any value
-- including null
create table list_parted (a int) partition by list (a);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index df3fca025e..12b1e85725 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1332,6 +1332,197 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
--
+-- Test Partition pruning for HASH partitioning
+--
+-- Use hand-rolled hash functions and operator classes to get predictable
+-- result on different matchines. See the definitions of
+-- part_part_test_int4_ops and part_test_text_ops in insert.sql.
+--
+create table hp (a int, b text) partition by hash (a part_test_int4_ops, b part_test_text_ops);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (2, 'xxx');
+insert into hp values (1, 'abcde');
+select tableoid::regclass, * from hp order by 1;
+ tableoid | a | b
+----------+---+-------
+ hp0 | |
+ hp0 | 1 | xxx
+ hp3 | 2 | xxx
+ hp1 | 1 |
+ hp2 | | xxx
+ hp2 | 1 | abcde
+(6 rows)
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+ QUERY PLAN
+-------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a = 1)
+ -> Seq Scan on hp1
+ Filter: (a = 1)
+ -> Seq Scan on hp2
+ Filter: (a = 1)
+ -> Seq Scan on hp3
+ Filter: (a = 1)
+(9 rows)
+
+explain (costs off) select * from hp where b = 'xxx';
+ QUERY PLAN
+-----------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp1
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp2
+ Filter: (b = 'xxx'::text)
+ -> Seq Scan on hp3
+ Filter: (b = 'xxx'::text)
+(9 rows)
+
+explain (costs off) select * from hp where a is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (a IS NULL)
+ -> Seq Scan on hp1
+ Filter: (a IS NULL)
+ -> Seq Scan on hp2
+ Filter: (a IS NULL)
+ -> Seq Scan on hp3
+ Filter: (a IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where b is null;
+ QUERY PLAN
+-----------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (b IS NULL)
+ -> Seq Scan on hp1
+ Filter: (b IS NULL)
+ -> Seq Scan on hp2
+ Filter: (b IS NULL)
+ -> Seq Scan on hp3
+ Filter: (b IS NULL)
+(9 rows)
+
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a < 1) AND (b = 'xxx'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+ QUERY PLAN
+--------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b = 'yyy'::text))
+(9 rows)
+
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp1
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp2
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+ -> Seq Scan on hp3
+ Filter: ((a <> 1) AND (b <> 'xxx'::text))
+(9 rows)
+
+-- pruning should work if either a value or a IS NULL clause is provided for
+-- each of the keys
+explain (costs off) select * from hp where a is null and b is null;
+ QUERY PLAN
+-----------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a IS NULL) AND (b IS NULL))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b is null;
+ QUERY PLAN
+-------------------------------------------
+ Append
+ -> Seq Scan on hp1
+ Filter: ((b IS NULL) AND (a = 1))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: ((a = 1) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a is null and b = 'xxx';
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a IS NULL) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 2 and b = 'xxx';
+ QUERY PLAN
+-------------------------------------------------
+ Append
+ -> Seq Scan on hp3
+ Filter: ((a = 2) AND (b = 'xxx'::text))
+(3 rows)
+
+explain (costs off) select * from hp where a = 1 and b = 'abcde';
+ QUERY PLAN
+---------------------------------------------------
+ Append
+ -> Seq Scan on hp2
+ Filter: ((a = 1) AND (b = 'abcde'::text))
+(3 rows)
+
+explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on hp0
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp2
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+ -> Seq Scan on hp3
+ Filter: (((a = 1) AND (b = 'abcde'::text)) OR ((a = 2) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
+(7 rows)
+
+drop table hp;
+--
-- Test runtime partition pruning
--
create table ab (a int not null, b int not null) partition by list (a);
diff --git a/src/test/regress/expected/partition_prune_hash.out b/src/test/regress/expected/partition_prune_hash.out
deleted file mode 100644
index fbba3f1ff8..0000000000
--- a/src/test/regress/expected/partition_prune_hash.out
+++ /dev/null
@@ -1,189 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
- tableoid | a | b
-----------+----+-----
- hp0 | |
- hp0 | 1 |
- hp0 | 1 | xxx
- hp3 | 10 | yyy
- hp1 | | xxx
- hp2 | 10 | xxx
-(6 rows)
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
- QUERY PLAN
--------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a = 1)
- -> Seq Scan on hp1
- Filter: (a = 1)
- -> Seq Scan on hp2
- Filter: (a = 1)
- -> Seq Scan on hp3
- Filter: (a = 1)
-(9 rows)
-
-explain (costs off) select * from hp where b = 'xxx';
- QUERY PLAN
------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp1
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp2
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp3
- Filter: (b = 'xxx'::text)
-(9 rows)
-
-explain (costs off) select * from hp where a is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a IS NULL)
- -> Seq Scan on hp1
- Filter: (a IS NULL)
- -> Seq Scan on hp2
- Filter: (a IS NULL)
- -> Seq Scan on hp3
- Filter: (a IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where b is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b IS NULL)
- -> Seq Scan on hp1
- Filter: (b IS NULL)
- -> Seq Scan on hp2
- Filter: (b IS NULL)
- -> Seq Scan on hp3
- Filter: (b IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a < 1) AND (b = 'xxx'::text))
-(9 rows)
-
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b = 'yyy'::text))
-(9 rows)
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
- QUERY PLAN
------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a IS NULL) AND (b IS NULL))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b is null;
- QUERY PLAN
--------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((b IS NULL) AND (a = 1))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a is null and b = 'xxx';
- QUERY PLAN
------------------------------------------------------
- Append
- -> Seq Scan on hp1
- Filter: ((a IS NULL) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp2
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'yyy'::text))
-(3 rows)
-
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp2
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp3
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-(7 rows)
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
- QUERY PLAN
----------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
-(9 rows)
-
-drop table hp;
diff --git a/src/test/regress/expected/partition_prune_hash_1.out b/src/test/regress/expected/partition_prune_hash_1.out
deleted file mode 100644
index 4a26a0e277..0000000000
--- a/src/test/regress/expected/partition_prune_hash_1.out
+++ /dev/null
@@ -1,187 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
- tableoid | a | b
-----------+----+-----
- hp0 | |
- hp0 | 1 |
- hp0 | 10 | xxx
- hp3 | | xxx
- hp3 | 10 | yyy
- hp2 | 1 | xxx
-(6 rows)
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
- QUERY PLAN
--------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a = 1)
- -> Seq Scan on hp1
- Filter: (a = 1)
- -> Seq Scan on hp2
- Filter: (a = 1)
- -> Seq Scan on hp3
- Filter: (a = 1)
-(9 rows)
-
-explain (costs off) select * from hp where b = 'xxx';
- QUERY PLAN
------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp1
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp2
- Filter: (b = 'xxx'::text)
- -> Seq Scan on hp3
- Filter: (b = 'xxx'::text)
-(9 rows)
-
-explain (costs off) select * from hp where a is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (a IS NULL)
- -> Seq Scan on hp1
- Filter: (a IS NULL)
- -> Seq Scan on hp2
- Filter: (a IS NULL)
- -> Seq Scan on hp3
- Filter: (a IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where b is null;
- QUERY PLAN
------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (b IS NULL)
- -> Seq Scan on hp1
- Filter: (b IS NULL)
- -> Seq Scan on hp2
- Filter: (b IS NULL)
- -> Seq Scan on hp3
- Filter: (b IS NULL)
-(9 rows)
-
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a < 1) AND (b = 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a < 1) AND (b = 'xxx'::text))
-(9 rows)
-
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b = 'yyy'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b = 'yyy'::text))
-(9 rows)
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
- QUERY PLAN
------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a IS NULL) AND (b IS NULL))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b is null;
- QUERY PLAN
--------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((b IS NULL) AND (a = 1))
-(3 rows)
-
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
- QUERY PLAN
--------------------------------------------------
- Append
- -> Seq Scan on hp2
- Filter: ((a = 1) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a is null and b = 'xxx';
- QUERY PLAN
------------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a IS NULL) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a = 10) AND (b = 'xxx'::text))
-(3 rows)
-
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
- QUERY PLAN
---------------------------------------------------
- Append
- -> Seq Scan on hp3
- Filter: ((a = 10) AND (b = 'yyy'::text))
-(3 rows)
-
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
- -> Seq Scan on hp3
- Filter: (((a = 10) AND (b = 'yyy'::text)) OR ((a = 10) AND (b = 'xxx'::text)) OR ((a IS NULL) AND (b IS NULL)))
-(5 rows)
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
- QUERY PLAN
----------------------------------------------------
- Append
- -> Seq Scan on hp0
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp1
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp2
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
- -> Seq Scan on hp3
- Filter: ((a <> 1) AND (b <> 'xxx'::text))
-(9 rows)
-
-drop table hp;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 0d3a27ed41..839d8a4a4d 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -116,7 +116,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid c
# ----------
# Another group of parallel tests
# ----------
-test: identity partition_join partition_prune partition_prune_hash reloptions hash_part indexing partition_aggregate fast_default
+test: identity partition_join partition_prune reloptions hash_part indexing partition_aggregate fast_default
# event triggers cannot run concurrently with any test that runs DDL
test: event_trigger
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 20027c131c..12e10b3ce4 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -185,7 +185,6 @@ test: xml
test: identity
test: partition_join
test: partition_prune
-test: partition_prune_hash
test: reloptions
test: hash_part
test: indexing
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 4929a3628b..d508a69456 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2367,21 +2367,14 @@ DROP TABLE quuux;
-- check validation when attaching hash partitions
--- The default hash functions as they exist today aren't portable; they can
--- return different results on different machines. Depending upon how the
--- values are hashed, the row may map to different partitions, which result in
--- regression failure. To avoid this, let's create a non-default hash function
--- that just returns the input value unchanged.
-CREATE OR REPLACE FUNCTION dummy_hashint4(a int4, seed int8) RETURNS int8 AS
-$$ BEGIN RETURN (a + 1 + seed); END; $$ LANGUAGE 'plpgsql' IMMUTABLE;
-CREATE OPERATOR CLASS custom_opclass FOR TYPE int4 USING HASH AS
-OPERATOR 1 = , FUNCTION 2 dummy_hashint4(int4, int8);
+-- Use hand-rolled hash functions and operator class to get predictable result
+-- on different matchines. part_test_int4_ops is defined in insert.sql.
-- check that the new partition won't overlap with an existing partition
CREATE TABLE hash_parted (
a int,
b int
-) PARTITION BY HASH (a custom_opclass);
+) PARTITION BY HASH (a part_test_int4_ops);
CREATE TABLE hpart_1 PARTITION OF hash_parted FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE fail_part (LIKE hpart_1);
ALTER TABLE hash_parted ATTACH PARTITION fail_part FOR VALUES WITH (MODULUS 8, REMAINDER 4);
@@ -2519,8 +2512,6 @@ SELECT * FROM list_parted;
DROP TABLE list_parted, list_parted2, range_parted;
DROP TABLE fail_def_part;
DROP TABLE hash_parted;
-DROP OPERATOR CLASS custom_opclass USING HASH;
-DROP FUNCTION dummy_hashint4(a int4, seed int8);
-- more tests for certain multi-level partitioning scenarios
create table p (a int, b int) partition by range (a, b);
diff --git a/src/test/regress/sql/hash_part.sql b/src/test/regress/sql/hash_part.sql
index 94c5eaab0c..f457ac344c 100644
--- a/src/test/regress/sql/hash_part.sql
+++ b/src/test/regress/sql/hash_part.sql
@@ -2,18 +2,12 @@
-- Hash partitioning.
--
-CREATE OR REPLACE FUNCTION hashint4_noop(int4, int8) RETURNS int8 AS
-$$SELECT coalesce($1,0)::int8$$ LANGUAGE sql IMMUTABLE;
-CREATE OPERATOR CLASS test_int4_ops FOR TYPE int4 USING HASH AS
-OPERATOR 1 = , FUNCTION 2 hashint4_noop(int4, int8);
-
-CREATE OR REPLACE FUNCTION hashtext_length(text, int8) RETURNS int8 AS
-$$SELECT length(coalesce($1,''))::int8$$ LANGUAGE sql IMMUTABLE;
-CREATE OPERATOR CLASS test_text_ops FOR TYPE text USING HASH AS
-OPERATOR 1 = , FUNCTION 2 hashtext_length(text, int8);
+-- Use hand-rolled hash functions and operator classes to get predictable
+-- result on different matchines. See the definitions of
+-- part_part_test_int4_ops and part_test_text_ops in insert.sql.
CREATE TABLE mchash (a int, b text, c jsonb)
- PARTITION BY HASH (a test_int4_ops, b test_text_ops);
+ PARTITION BY HASH (a part_test_int4_ops, b part_test_text_ops);
CREATE TABLE mchash1
PARTITION OF mchash FOR VALUES WITH (MODULUS 4, REMAINDER 0);
@@ -54,7 +48,7 @@ SELECT satisfies_hash_partition('mchash'::regclass, 2, 1, NULL::int, NULL::int);
SELECT satisfies_hash_partition('mchash'::regclass, 4, 0, 0, ''::text);
-- ok, should be true
-SELECT satisfies_hash_partition('mchash'::regclass, 4, 0, 1, ''::text);
+SELECT satisfies_hash_partition('mchash'::regclass, 4, 0, 2, ''::text);
-- argument via variadic syntax, should fail because not all partitioning
-- columns are of the correct type
@@ -63,7 +57,7 @@ SELECT satisfies_hash_partition('mchash'::regclass, 2, 1,
-- multiple partitioning columns of the same type
CREATE TABLE mcinthash (a int, b int, c jsonb)
- PARTITION BY HASH (a test_int4_ops, b test_int4_ops);
+ PARTITION BY HASH (a part_test_int4_ops, b part_test_int4_ops);
-- now variadic should work, should be false
SELECT satisfies_hash_partition('mcinthash'::regclass, 4, 0,
@@ -71,7 +65,7 @@ SELECT satisfies_hash_partition('mcinthash'::regclass, 4, 0,
-- should be true
SELECT satisfies_hash_partition('mcinthash'::regclass, 4, 0,
- variadic array[1, 0]);
+ variadic array[0, 1]);
-- wrong length
SELECT satisfies_hash_partition('mcinthash'::regclass, 4, 0,
@@ -84,7 +78,3 @@ SELECT satisfies_hash_partition('mcinthash'::regclass, 4, 0,
-- cleanup
DROP TABLE mchash;
DROP TABLE mcinthash;
-DROP OPERATOR CLASS test_text_ops USING hash;
-DROP OPERATOR CLASS test_int4_ops USING hash;
-DROP FUNCTION hashint4_noop(int4, int8);
-DROP FUNCTION hashtext_length(text, int8);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index a16f2a7f89..a7f659bc2b 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -228,16 +228,36 @@ select tableoid::regclass::text, a, min(b) as min_b, max(b) as max_b from list_p
-- direct partition inserts should check hash partition bound constraint
--- create custom operator class and hash function, for the same reason
--- explained in alter_table.sql
-create or replace function dummy_hashint4(a int4, seed int8) returns int8 as
-$$ begin return (a + seed); end; $$ language 'plpgsql' immutable;
-create operator class custom_opclass for type int4 using hash as
-operator 1 = , function 2 dummy_hashint4(int4, int8);
+-- Use hand-rolled hash functions and operator classes to get predictable
+-- result on different matchines. The hash function for int4 simply returns
+-- the sum of the values passed to it and the one for text returns the length
+-- of the non-empty string value passed to it or 0.
+
+create or replace function part_hashint4_noop(value int4, seed int8)
+returns int8 as $$
+select value + seed;
+$$ language sql immutable;
+
+create operator class part_test_int4_ops
+for type int4
+using hash as
+operator 1 =,
+function 2 part_hashint4_noop(int4, int8);
+
+create or replace function part_hashtext_length(value text, seed int8)
+RETURNS int8 AS $$
+select length(coalesce(value, ''))::int8
+$$ language sql immutable;
+
+create operator class part_test_text_ops
+for type text
+using hash as
+operator 1 =,
+function 2 part_hashtext_length(text, int8);
create table hash_parted (
a int
-) partition by hash (a custom_opclass);
+) partition by hash (a part_test_int4_ops);
create table hpart0 partition of hash_parted for values with (modulus 4, remainder 0);
create table hpart1 partition of hash_parted for values with (modulus 4, remainder 1);
create table hpart2 partition of hash_parted for values with (modulus 4, remainder 2);
@@ -263,8 +283,6 @@ from hash_parted order by part;
-- cleanup
drop table range_parted, list_parted;
drop table hash_parted;
-drop operator class custom_opclass using hash;
-drop function dummy_hashint4(a int4, seed int8);
-- test that a default partition added as the first partition accepts any value
-- including null
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 7fe93bbc04..19dd381514 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -238,6 +238,48 @@ explain (costs off) select * from rparted_by_int2 where a > 100000000000000;
drop table lp, coll_pruning, rlp, mc3p, mc2p, boolpart, rp, coll_pruning_multi, like_op_noprune, lparted_by_int2, rparted_by_int2;
+--
+-- Test Partition pruning for HASH partitioning
+--
+-- Use hand-rolled hash functions and operator classes to get predictable
+-- result on different matchines. See the definitions of
+-- part_part_test_int4_ops and part_test_text_ops in insert.sql.
+--
+
+create table hp (a int, b text) partition by hash (a part_test_int4_ops, b part_test_text_ops);
+create table hp0 partition of hp for values with (modulus 4, remainder 0);
+create table hp3 partition of hp for values with (modulus 4, remainder 3);
+create table hp1 partition of hp for values with (modulus 4, remainder 1);
+create table hp2 partition of hp for values with (modulus 4, remainder 2);
+
+insert into hp values (null, null);
+insert into hp values (1, null);
+insert into hp values (1, 'xxx');
+insert into hp values (null, 'xxx');
+insert into hp values (2, 'xxx');
+insert into hp values (1, 'abcde');
+select tableoid::regclass, * from hp order by 1;
+
+-- partial keys won't prune, nor would non-equality conditions
+explain (costs off) select * from hp where a = 1;
+explain (costs off) select * from hp where b = 'xxx';
+explain (costs off) select * from hp where a is null;
+explain (costs off) select * from hp where b is null;
+explain (costs off) select * from hp where a < 1 and b = 'xxx';
+explain (costs off) select * from hp where a <> 1 and b = 'yyy';
+explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
+
+-- pruning should work if either a value or a IS NULL clause is provided for
+-- each of the keys
+explain (costs off) select * from hp where a is null and b is null;
+explain (costs off) select * from hp where a = 1 and b is null;
+explain (costs off) select * from hp where a = 1 and b = 'xxx';
+explain (costs off) select * from hp where a is null and b = 'xxx';
+explain (costs off) select * from hp where a = 2 and b = 'xxx';
+explain (costs off) select * from hp where a = 1 and b = 'abcde';
+explain (costs off) select * from hp where (a = 1 and b = 'abcde') or (a = 2 and b = 'xxx') or (a is null and b is null);
+
+drop table hp;
--
-- Test runtime partition pruning
@@ -587,4 +629,4 @@ select * from boolp where a = (select value from boolvalues where not value);
drop table boolp;
-reset enable_indexonlyscan;
\ No newline at end of file
+reset enable_indexonlyscan;
diff --git a/src/test/regress/sql/partition_prune_hash.sql b/src/test/regress/sql/partition_prune_hash.sql
deleted file mode 100644
index fd1783bf53..0000000000
--- a/src/test/regress/sql/partition_prune_hash.sql
+++ /dev/null
@@ -1,41 +0,0 @@
---
--- Test Partition pruning for HASH partitioning
--- We keep this as a seperate test as hash functions return
--- values will vary based on CPU architecture.
---
-
-create table hp (a int, b text) partition by hash (a, b);
-create table hp0 partition of hp for values with (modulus 4, remainder 0);
-create table hp3 partition of hp for values with (modulus 4, remainder 3);
-create table hp1 partition of hp for values with (modulus 4, remainder 1);
-create table hp2 partition of hp for values with (modulus 4, remainder 2);
-
-insert into hp values (null, null);
-insert into hp values (1, null);
-insert into hp values (1, 'xxx');
-insert into hp values (null, 'xxx');
-insert into hp values (10, 'xxx');
-insert into hp values (10, 'yyy');
-select tableoid::regclass, * from hp order by 1;
-
--- partial keys won't prune, nor would non-equality conditions
-explain (costs off) select * from hp where a = 1;
-explain (costs off) select * from hp where b = 'xxx';
-explain (costs off) select * from hp where a is null;
-explain (costs off) select * from hp where b is null;
-explain (costs off) select * from hp where a < 1 and b = 'xxx';
-explain (costs off) select * from hp where a <> 1 and b = 'yyy';
-
--- pruning should work if non-null values are provided for all the keys
-explain (costs off) select * from hp where a is null and b is null;
-explain (costs off) select * from hp where a = 1 and b is null;
-explain (costs off) select * from hp where a = 1 and b = 'xxx';
-explain (costs off) select * from hp where a is null and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'xxx';
-explain (costs off) select * from hp where a = 10 and b = 'yyy';
-explain (costs off) select * from hp where (a = 10 and b = 'yyy') or (a = 10 and b = 'xxx') or (a is null and b is null);
-
--- hash partitiong pruning doesn't occur with <> operator clauses
-explain (costs off) select * from hp where a <> 1 and b <> 'xxx';
-
-drop table hp;
--
2.11.0
On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's an idea. Why don't we move the function/opclass creation lines
to insert.sql, without the DROPs, and use the same functions/opclasses
in the three tests insert.sql, alter_table.sql, hash_part.sql and
partition_prune.sql, i.e. not recreate what are essentially the same
objects three times? This also leaves them around for the pg_upgrade
test, which is not a bad thing.
That sounds good, but maybe we should go further and move the
partitioning tests out of generically-named things like insert.sql
altogether and have test names that actually mention partitioning.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018/04/13 1:47, Robert Haas wrote:
On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's an idea. Why don't we move the function/opclass creation lines
to insert.sql, without the DROPs, and use the same functions/opclasses
in the three tests insert.sql, alter_table.sql, hash_part.sql and
partition_prune.sql, i.e. not recreate what are essentially the same
objects three times? This also leaves them around for the pg_upgrade
test, which is not a bad thing.That sounds good, but maybe we should go further and move the
partitioning tests out of generically-named things like insert.sql
altogether and have test names that actually mention partitioning.
Do you mean to do that for *all* files that have tests exercising some
partitioning code, even if it's just one test? I can see that tests in at
least some of them could be put into their own partition_ file as follows:
partition_insert (including tests in insert_conflict)
partition_update
partition_triggers
partition_indexing (indexing.sql added when partitioned indexes went in)
partition_ddl (for the tests in create_table and alter_table)
That leaves:
cluster
create_index (one test here could be moved to partition_indexing?)
foreign_data (could be moved to partition_ddl?)
foreign_key (could be moved to partition_ddl?)
hash_part (leave alone, because already contains 'part' in the name?)
identity
join
plancache
plpgsql
publication
rowsecurity
rules
stats_ext
tablesample
truncate
updatable_views
vacuum
What about the tests in inherit.sql that start with:
--
-- Check that constraint exclusion works correctly with partitions using
-- implicit constraints generated from the partition bound information.
--
Maybe, just move all of them to partition_prune.sql, because we no longer
use constraint exclusion for pruning?
Thanks,
Amit
On 13 April 2018 at 14:15, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/04/13 1:47, Robert Haas wrote:
On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's an idea. Why don't we move the function/opclass creation lines
to insert.sql, without the DROPs, and use the same functions/opclasses
in the three tests insert.sql, alter_table.sql, hash_part.sql and
partition_prune.sql, i.e. not recreate what are essentially the same
objects three times? This also leaves them around for the pg_upgrade
test, which is not a bad thing.That sounds good, but maybe we should go further and move the
partitioning tests out of generically-named things like insert.sql
altogether and have test names that actually mention partitioning.Do you mean to do that for *all* files that have tests exercising some
partitioning code, even if it's just one test? I can see that tests in at
least some of them could be put into their own partition_ file as follows:
Wouldn't it be best to just move hash partition tests into hash_part?
Leave all the other stuff where it is?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Apr 13, 2018 at 7:45 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/04/13 1:47, Robert Haas wrote:
On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's an idea. Why don't we move the function/opclass creation lines
to insert.sql, without the DROPs, and use the same functions/opclasses
in the three tests insert.sql, alter_table.sql, hash_part.sql and
partition_prune.sql, i.e. not recreate what are essentially the same
objects three times? This also leaves them around for the pg_upgrade
test, which is not a bad thing.That sounds good, but maybe we should go further and move the
partitioning tests out of generically-named things like insert.sql
altogether and have test names that actually mention partitioning.Do you mean to do that for *all* files that have tests exercising some
partitioning code, even if it's just one test? I can see that tests in at
least some of them could be put into their own partition_ file as follows:partition_insert (including tests in insert_conflict)
partition_update
partition_triggers
partition_indexing (indexing.sql added when partitioned indexes went in)
partition_ddl (for the tests in create_table and alter_table)
We haven't generally created test files specific to a table type for
example temporary tables or unlogged tables, instead have created
files by SQL commands. But then that's not true for indexes; we have
separate files for indexes and we also have separate file for
materialized views and also for various data types. So, our test file
organization seems to have cut across of SQL commands and object
types. But partitioning seems an area large enough to have files of
its own; we already have partition_join and partition_aggregate.
Do we want to move to a directory based organization for tests also,
where sql/ expected/ will have directories within them for various
types of objects like partitioned tables, indexes, regular tables,
datatypes etc. and each of those will have files organized by sql
commands? An immediate question arises as to where to add the files
which exercises a mixture of objects; may be in a directory
corresponding to the primary object like materialized views over
partitioned tables, would fit materialized view (or just views?)
directory.
Whatever organization we want to use, it should be easy to find
testcases for relevant functionality e.g. all tests for partitioned
tables or all alter table command tests.
What about the tests in inherit.sql that start with:
--
-- Check that constraint exclusion works correctly with partitions using
-- implicit constraints generated from the partition bound information.
--Maybe, just move all of them to partition_prune.sql, because we no longer
use constraint exclusion for pruning?
I think we need to have some testcases somwhere to test constraint
exclusion on partitions and partitioned tables, those do not
necessarily fit partition pruning.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Robert Haas wrote:
On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's an idea. Why don't we move the function/opclass creation lines
to insert.sql, without the DROPs, and use the same functions/opclasses
in the three tests insert.sql, alter_table.sql, hash_part.sql and
partition_prune.sql, i.e. not recreate what are essentially the same
objects three times? This also leaves them around for the pg_upgrade
test, which is not a bad thing.That sounds good, but maybe we should go further and move the
partitioning tests out of generically-named things like insert.sql
altogether and have test names that actually mention partitioning.
I don't think that's necessary to fix the problem that
partition_prune_hash.sql file has two expected output files. If you
want to propose such a reorganization, feel free, but let's not hijack
the patch at hand. For the record, I'm not a fan of the idea.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Apr 13, 2018 at 10:50 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
Robert Haas wrote:
On Wed, Apr 11, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Here's an idea. Why don't we move the function/opclass creation lines
to insert.sql, without the DROPs, and use the same functions/opclasses
in the three tests insert.sql, alter_table.sql, hash_part.sql and
partition_prune.sql, i.e. not recreate what are essentially the same
objects three times? This also leaves them around for the pg_upgrade
test, which is not a bad thing.That sounds good, but maybe we should go further and move the
partitioning tests out of generically-named things like insert.sql
altogether and have test names that actually mention partitioning.I don't think that's necessary to fix the problem that
partition_prune_hash.sql file has two expected output files. If you
want to propose such a reorganization, feel free, but let's not hijack
the patch at hand. For the record, I'm not a fan of the idea.
Fair enough. I don't think I'm hacking the thread much more than it
was already hijacked; and it was just a thought. I haven't really
studied the tests well enough to have a really clear idea what a
better organization would look like. It was just that, for example,
the commit that added hash partitioning added tests to 5 different
files, and some things had to be duplicated as a result. It sounds
like what you've already done improves that, but I was wondering if
there's a way to do better. I don't feel super-strongly about it
though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Amit Langote wrote:
PS: git grep "partition by hash\|PARTITION BY HASH" on src/test indicates
that there are hash partitioning related tests in create_table,
foreign_key, and partition_join files as well. Do we want to use the
custom opclass in those files as well?
By the way, let me suggest 'git grep -i' instead.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018/04/21 0:58, Alvaro Herrera wrote:
Amit Langote wrote:
PS: git grep "partition by hash\|PARTITION BY HASH" on src/test indicates
that there are hash partitioning related tests in create_table,
foreign_key, and partition_join files as well. Do we want to use the
custom opclass in those files as well?By the way, let me suggest 'git grep -i' instead.
Ah, thanks.
Regards,
Amit
Hello everyone in this thread!
I got a similar server crash as in [1]/messages/by-id/CAKcux6nCsCmu9oUnnuKZkeBenYvUFbU2Lt4q2MFNEb7QErzn8w@mail.gmail.com on the master branch since the
commit 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails
because the second argument ScalarArrayOpExpr is not a Const or an
ArrayExpr, but is an ArrayCoerceExpr (see [2]partprune.c, function match_clause_to_partition_key: if (IsA(rightop, Const)) { ... } else { ArrayExpr *arrexpr = castNode(ArrayExpr, rightop); # fails here ... }):
=# create table list_parted (
a varchar
) partition by list (a);
=# create table part_ab_cd partition of list_parted for values in ('ab',
'cd');
=# CREATE OR REPLACE FUNCTION public.x_stl_text_integer (
)
RETURNS text STABLE AS
$body$
BEGIN
RAISE NOTICE 's text integer';
RETURN 1::text;
END;
$body$
LANGUAGE 'plpgsql';
=# explain (costs off) select * from list_parted where a in ('ab', 'cd',
x_stl_text_integer());
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
[1]: /messages/by-id/CAKcux6nCsCmu9oUnnuKZkeBenYvUFbU2Lt4q2MFNEb7QErzn8w@mail.gmail.com
/messages/by-id/CAKcux6nCsCmu9oUnnuKZkeBenYvUFbU2Lt4q2MFNEb7QErzn8w@mail.gmail.com
[2]: partprune.c, function match_clause_to_partition_key: if (IsA(rightop, Const)) { ... } else { ArrayExpr *arrexpr = castNode(ArrayExpr, rightop); # fails here ... }
if (IsA(rightop, Const))
{
...
}
else
{
ArrayExpr *arrexpr = castNode(ArrayExpr, rightop); # fails here
...
}
--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
I got a similar server crash as in [1] on the master branch since the commit
9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because
the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is
an ArrayCoerceExpr (see [2]):
Indeed, I can see the crash. I have been playing with this stuff and I
am in the middle of writing the patch, but let's track this properly for
now.
--
Michael
On 07-05-2018 4:37, Michael Paquier wrote:
On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
I got a similar server crash as in [1] on the master branch since the
commit
9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails
because
the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr,
but is
an ArrayCoerceExpr (see [2]):Indeed, I can see the crash. I have been playing with this stuff and I
am in the middle of writing the patch, but let's track this properly
for
now.
Thank you very much!
--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, May 07, 2018 at 10:37:10AM +0900, Michael Paquier wrote:
On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
I got a similar server crash as in [1] on the master branch since the commit
9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because
the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is
an ArrayCoerceExpr (see [2]):Indeed, I can see the crash. I have been playing with this stuff and I
am in the middle of writing the patch, but let's track this properly for
now.
So the problem appears when an expression needs to use
COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another,
which requires an execution state to be able to build the list of
elements. The clause matching happens at planning state, so while there
are surely cases where this could be improved depending on the
expression type, I propose to just discard all clauses which do not
match OpExpr and ArrayExpr for now, as per the attached. It would be
definitely a good practice to have a default code path returning
PARTCLAUSE_UNSUPPORTED where the element list cannot be built.
Thoughts?
--
Michael
Attachments:
partprune-coerce-array.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index f954b92a6b..2d2f88e880 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -1690,7 +1690,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
elem_exprs = lappend(elem_exprs, elem_expr);
}
}
- else
+ else if (IsA(rightop, ArrayExpr))
{
ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
@@ -1704,6 +1704,16 @@ match_clause_to_partition_key(RelOptInfo *rel,
elem_exprs = arrexpr->elements;
}
+ else
+ {
+ /*
+ * Ignore all other clause types. It could be possible here
+ * to reach this code path with a type coercion from an
+ * array type to another with ArrayCoerceExpr which depends on
+ * per-element execution for the conversion.
+ */
+ return PARTCLAUSE_UNSUPPORTED;
+ }
/*
* Now generate a list of clauses, one for each array element, of the
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e0cc5f3393..86dcd62d55 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1073,6 +1073,48 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
+-- type coercion from one array type to another, no pruning
+create table coercepart (a varchar) partition by list (a);
+create table coercepart_ab partition of coercepart for values in ('ab');
+create table coercepart_cd partition of coercepart for values in ('cd');
+create table coercepart_ef_gh partition of coercepart for values in ('ef', 'gh');
+explain (costs off) select * from coercepart where a in ('ab', to_char(125, '999'));
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_ef_gh
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a in ('ab', NULL, to_char(125, '999'));
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_ef_gh
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, NULL::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a in ('ef', 'gh', to_char(125, '999'));
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text = ANY ((ARRAY['ef'::character varying, 'gh'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text = ANY ((ARRAY['ef'::character varying, 'gh'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_ef_gh
+ Filter: ((a)::text = ANY ((ARRAY['ef'::character varying, 'gh'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+(7 rows)
+
+drop table coercepart;
--
-- some more cases
--
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6b7f57ab41..267b7a3545 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,6 +152,16 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
+-- type coercion from one array type to another, no pruning
+create table coercepart (a varchar) partition by list (a);
+create table coercepart_ab partition of coercepart for values in ('ab');
+create table coercepart_cd partition of coercepart for values in ('cd');
+create table coercepart_ef_gh partition of coercepart for values in ('ef', 'gh');
+explain (costs off) select * from coercepart where a in ('ab', to_char(125, '999'));
+explain (costs off) select * from coercepart where a in ('ab', NULL, to_char(125, '999'));
+explain (costs off) select * from coercepart where a in ('ef', 'gh', to_char(125, '999'));
+drop table coercepart;
+
--
-- some more cases
--
Thank you Marina for the report and Michael for following up.
On 2018/05/07 16:56, Michael Paquier wrote:
On Mon, May 07, 2018 at 10:37:10AM +0900, Michael Paquier wrote:
On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote:
I got a similar server crash as in [1] on the master branch since the commit
9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because
the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is
an ArrayCoerceExpr (see [2]):Indeed, I can see the crash. I have been playing with this stuff and I
am in the middle of writing the patch, but let's track this properly for
now.So the problem appears when an expression needs to use
COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another,
which requires an execution state to be able to build the list of
elements. The clause matching happens at planning state, so while there
are surely cases where this could be improved depending on the
expression type, I propose to just discard all clauses which do not
match OpExpr and ArrayExpr for now, as per the attached. It would be
definitely a good practice to have a default code path returning
PARTCLAUSE_UNSUPPORTED where the element list cannot be built.Thoughts?
I have to agree to go with this conservative approach for now. Although
we might be able to evaluate the array elements by applying the coercion
specified by ArrayCoerceExpr, let's save that as an improvement to be
pursued later.
FWIW, constraint exclusion wouldn't prune in this case either (that is, if
you try this example with PG 10 or using HEAD as of the parent of
9fdb675fc5), but it doesn't crash like the new pruning code does.
Thanks again.
Regards,
Amit
On Tue, May 08, 2018 at 04:07:41PM +0900, Amit Langote wrote:
I have to agree to go with this conservative approach for now. Although
we might be able to evaluate the array elements by applying the coercion
specified by ArrayCoerceExpr, let's save that as an improvement to be
pursued later.
Thanks for confirming. Yes, non-volatile functions would be actually
safe, and we'd need to be careful about NULL handling as well, but
that's definitely out of scope for v11.
FWIW, constraint exclusion wouldn't prune in this case either (that is, if
you try this example with PG 10 or using HEAD as of the parent of
9fdb675fc5), but it doesn't crash like the new pruning code does.
Yeah, I have noticed that.
--
Michael
Michael Paquier wrote:
So the problem appears when an expression needs to use
COERCION_PATH_ARRAYCOERCE for a type coercion from one type to another,
which requires an execution state to be able to build the list of
elements. The clause matching happens at planning state, so while there
are surely cases where this could be improved depending on the
expression type, I propose to just discard all clauses which do not
match OpExpr and ArrayExpr for now, as per the attached. It would be
definitely a good practice to have a default code path returning
PARTCLAUSE_UNSUPPORTED where the element list cannot be built.Thoughts?
I found a related crash and I'm investigating it further.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
So I found that this query also crashed (using your rig),
create table coercepart (a varchar) partition by list (a);
create table coercepart_ab partition of coercepart for values in ('ab');
create table coercepart_bc partition of coercepart for values in ('bc');
create table coercepart_cd partition of coercepart for values in ('cd');
explain (costs off) select * from coercepart where a ~ any ('{ab}');
The reason for this crash is that gen_partprune_steps_internal() is
unable to generate any steps for the clause -- which is natural, since
the operator is not in a btree opclass. There are various callers
of gen_partprune_steps_internal that are aware that it could return an
empty set of steps, but this one in match_clause_to_partition_key for
the ScalarArrayOpExpr case was being a bit too optimistic.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
partprune-coerce-array-2.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index f954b92a6b..f8aaccfa18 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -571,8 +571,9 @@ get_matching_partitions(PartitionPruneContext *context, List *pruning_steps)
* For BoolExpr clauses, we recursively generate steps for each argument, and
* return a PartitionPruneStepCombine of their results.
*
- * The generated steps are added to the context's steps list. Each step is
- * assigned a step identifier, unique even across recursive calls.
+ * The return value is a list of the steps generated, which are also added to
+ * the context's steps list. Each step is assigned a step identifier, unique
+ * even across recursive calls.
*
* If we find clauses that are mutually contradictory, or a pseudoconstant
* clause that contains false, we set *contradictory to true and return NIL
@@ -1599,6 +1600,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
List *elem_exprs,
*elem_clauses;
ListCell *lc1;
+ bool contradictory;
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
@@ -1617,7 +1619,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
* Only allow strict operators. This will guarantee nulls are
* filtered.
*/
- if (!op_strict(saop->opno))
+ if (!op_strict(saop_op))
return PARTCLAUSE_UNSUPPORTED;
/* Useless if the array has any volatile functions. */
@@ -1690,7 +1692,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
elem_exprs = lappend(elem_exprs, elem_expr);
}
}
- else
+ else if (IsA(rightop, ArrayExpr))
{
ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
@@ -1704,6 +1706,11 @@ match_clause_to_partition_key(RelOptInfo *rel,
elem_exprs = arrexpr->elements;
}
+ else
+ {
+ /* Give up on any other clause types. */
+ return PARTCLAUSE_UNSUPPORTED;
+ }
/*
* Now generate a list of clauses, one for each array element, of the
@@ -1722,36 +1729,21 @@ match_clause_to_partition_key(RelOptInfo *rel,
}
/*
- * Build a combine step as if for an OR clause or add the clauses to
- * the end of the list that's being processed currently.
+ * If we have an ANY clause and multiple elements, first turn the list
+ * of clauses into an OR expression.
*/
if (saop->useOr && list_length(elem_clauses) > 1)
- {
- Expr *orexpr;
- bool contradictory;
+ elem_clauses = list_make1(makeBoolExpr(OR_EXPR, elem_clauses, -1));
- orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
- *clause_steps =
- gen_partprune_steps_internal(context, rel, list_make1(orexpr),
- &contradictory);
- if (contradictory)
- return PARTCLAUSE_MATCH_CONTRADICT;
-
- Assert(list_length(*clause_steps) == 1);
- return PARTCLAUSE_MATCH_STEPS;
- }
- else
- {
- bool contradictory;
-
- *clause_steps =
- gen_partprune_steps_internal(context, rel, elem_clauses,
- &contradictory);
- if (contradictory)
- return PARTCLAUSE_MATCH_CONTRADICT;
- Assert(list_length(*clause_steps) >= 1);
- return PARTCLAUSE_MATCH_STEPS;
- }
+ /* Finally, generate steps */
+ *clause_steps =
+ gen_partprune_steps_internal(context, rel, elem_clauses,
+ &contradictory);
+ if (contradictory)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ else if (*clause_steps == NIL)
+ return PARTCLAUSE_UNSUPPORTED; /* step generation failed */
+ return PARTCLAUSE_MATCH_STEPS;
}
else if (IsA(clause, NullTest))
{
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e0cc5f3393..cf331e79c1 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1073,6 +1073,72 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
+-- test scalar-to-array operators
+create table coercepart (a varchar) partition by list (a);
+create table coercepart_ab partition of coercepart for values in ('ab');
+create table coercepart_bc partition of coercepart for values in ('bc');
+create table coercepart_cd partition of coercepart for values in ('cd');
+explain (costs off) select * from coercepart where a in ('ab', to_char(125, '999'));
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a ~ any ('{ab}');
+ QUERY PLAN
+----------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text ~ ANY ('{ab}'::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text ~ ANY ('{ab}'::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text ~ ANY ('{ab}'::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a !~ all ('{ab}');
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text !~ ALL ('{ab}'::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text !~ ALL ('{ab}'::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text !~ ALL ('{ab}'::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a ~ any ('{ab,bc}');
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text ~ ANY ('{ab,bc}'::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text ~ ANY ('{ab,bc}'::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text ~ ANY ('{ab,bc}'::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a !~ all ('{ab,bc}');
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text !~ ALL ('{ab,bc}'::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text !~ ALL ('{ab,bc}'::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text !~ ALL ('{ab,bc}'::text[]))
+(7 rows)
+
+drop table coercepart;
--
-- some more cases
--
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6b7f57ab41..1464f4dcd9 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,6 +152,20 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
+-- test scalar-to-array operators
+create table coercepart (a varchar) partition by list (a);
+create table coercepart_ab partition of coercepart for values in ('ab');
+create table coercepart_bc partition of coercepart for values in ('bc');
+create table coercepart_cd partition of coercepart for values in ('cd');
+
+explain (costs off) select * from coercepart where a in ('ab', to_char(125, '999'));
+explain (costs off) select * from coercepart where a ~ any ('{ab}');
+explain (costs off) select * from coercepart where a !~ all ('{ab}');
+explain (costs off) select * from coercepart where a ~ any ('{ab,bc}');
+explain (costs off) select * from coercepart where a !~ all ('{ab,bc}');
+
+drop table coercepart;
+
--
-- some more cases
--
Hi.
On 2018/05/09 7:05, Alvaro Herrera wrote:
So I found that this query also crashed (using your rig),
create table coercepart (a varchar) partition by list (a);
create table coercepart_ab partition of coercepart for values in ('ab');
create table coercepart_bc partition of coercepart for values in ('bc');
create table coercepart_cd partition of coercepart for values in ('cd');
explain (costs off) select * from coercepart where a ~ any ('{ab}');The reason for this crash is that gen_partprune_steps_internal() is
unable to generate any steps for the clause -- which is natural, since
the operator is not in a btree opclass. There are various callers
of gen_partprune_steps_internal that are aware that it could return an
empty set of steps, but this one in match_clause_to_partition_key for
the ScalarArrayOpExpr case was being a bit too optimistic.
Yeah, good catch! That fixes the crash, but looking around that code a
bit, it seems that we shouldn't even have gotten up to the point you're
proposing to fix. If saop_op is not in the partitioning opfamily, it
should have bailed out much sooner, that is, here:
/*
* In case of NOT IN (..), we get a '<>', which we handle if list
* partitioning is in use and we're able to confirm that it's negator
* is a btree equality operator belonging to the partitioning operator
* family.
*/
if (!op_in_opfamily(saop_op, partopfamily))
{
<snip>
negator = get_negator(saop_op);
if (OidIsValid(negator) && op_in_opfamily(negator, partopfamily))
{
<snip>
}
+ else
+ /* otherwise, unsupported! */
+ return PARTCLAUSE_UNSUPPORTED;
Let me propose that we also have this along with the rest of the changes
you made in that part of the function. So, attached is an updated patch.
Thanks,
Amit
Attachments:
partprune-coerce-array-3.patchtext/plain; charset=UTF-8; name=partprune-coerce-array-3.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index f954b92a6b..be9ea8a6db 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -571,8 +571,9 @@ get_matching_partitions(PartitionPruneContext *context, List *pruning_steps)
* For BoolExpr clauses, we recursively generate steps for each argument, and
* return a PartitionPruneStepCombine of their results.
*
- * The generated steps are added to the context's steps list. Each step is
- * assigned a step identifier, unique even across recursive calls.
+ * The return value is a list of the steps generated, which are also added to
+ * the context's steps list. Each step is assigned a step identifier, unique
+ * even across recursive calls.
*
* If we find clauses that are mutually contradictory, or a pseudoconstant
* clause that contains false, we set *contradictory to true and return NIL
@@ -1599,6 +1600,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
List *elem_exprs,
*elem_clauses;
ListCell *lc1;
+ bool contradictory;
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
@@ -1617,7 +1619,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
* Only allow strict operators. This will guarantee nulls are
* filtered.
*/
- if (!op_strict(saop->opno))
+ if (!op_strict(saop_op))
return PARTCLAUSE_UNSUPPORTED;
/* Useless if the array has any volatile functions. */
@@ -1650,6 +1652,9 @@ match_clause_to_partition_key(RelOptInfo *rel,
if (strategy != BTEqualStrategyNumber)
return PARTCLAUSE_UNSUPPORTED;
}
+ else
+ /* otherwise, unsupported! */
+ return PARTCLAUSE_UNSUPPORTED;
}
/*
@@ -1690,7 +1695,7 @@ match_clause_to_partition_key(RelOptInfo *rel,
elem_exprs = lappend(elem_exprs, elem_expr);
}
}
- else
+ else if (IsA(rightop, ArrayExpr))
{
ArrayExpr *arrexpr = castNode(ArrayExpr, rightop);
@@ -1704,6 +1709,11 @@ match_clause_to_partition_key(RelOptInfo *rel,
elem_exprs = arrexpr->elements;
}
+ else
+ {
+ /* Give up on any other clause types. */
+ return PARTCLAUSE_UNSUPPORTED;
+ }
/*
* Now generate a list of clauses, one for each array element, of the
@@ -1722,36 +1732,21 @@ match_clause_to_partition_key(RelOptInfo *rel,
}
/*
- * Build a combine step as if for an OR clause or add the clauses to
- * the end of the list that's being processed currently.
+ * If we have an ANY clause and multiple elements, first turn the list
+ * of clauses into an OR expression.
*/
if (saop->useOr && list_length(elem_clauses) > 1)
- {
- Expr *orexpr;
- bool contradictory;
+ elem_clauses = list_make1(makeBoolExpr(OR_EXPR, elem_clauses, -1));
- orexpr = makeBoolExpr(OR_EXPR, elem_clauses, -1);
- *clause_steps =
- gen_partprune_steps_internal(context, rel, list_make1(orexpr),
- &contradictory);
- if (contradictory)
- return PARTCLAUSE_MATCH_CONTRADICT;
-
- Assert(list_length(*clause_steps) == 1);
- return PARTCLAUSE_MATCH_STEPS;
- }
- else
- {
- bool contradictory;
-
- *clause_steps =
- gen_partprune_steps_internal(context, rel, elem_clauses,
- &contradictory);
- if (contradictory)
- return PARTCLAUSE_MATCH_CONTRADICT;
- Assert(list_length(*clause_steps) >= 1);
- return PARTCLAUSE_MATCH_STEPS;
- }
+ /* Finally, generate steps */
+ *clause_steps =
+ gen_partprune_steps_internal(context, rel, elem_clauses,
+ &contradictory);
+ if (contradictory)
+ return PARTCLAUSE_MATCH_CONTRADICT;
+ else if (*clause_steps == NIL)
+ return PARTCLAUSE_UNSUPPORTED; /* step generation failed */
+ return PARTCLAUSE_MATCH_STEPS;
}
else if (IsA(clause, NullTest))
{
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index e0cc5f3393..cf331e79c1 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1073,6 +1073,72 @@ explain (costs off) select * from boolpart where a is not unknown;
Filter: (a IS NOT UNKNOWN)
(7 rows)
+-- test scalar-to-array operators
+create table coercepart (a varchar) partition by list (a);
+create table coercepart_ab partition of coercepart for values in ('ab');
+create table coercepart_bc partition of coercepart for values in ('bc');
+create table coercepart_cd partition of coercepart for values in ('cd');
+explain (costs off) select * from coercepart where a in ('ab', to_char(125, '999'));
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text = ANY ((ARRAY['ab'::character varying, (to_char(125, '999'::text))::character varying])::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a ~ any ('{ab}');
+ QUERY PLAN
+----------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text ~ ANY ('{ab}'::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text ~ ANY ('{ab}'::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text ~ ANY ('{ab}'::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a !~ all ('{ab}');
+ QUERY PLAN
+-----------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text !~ ALL ('{ab}'::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text !~ ALL ('{ab}'::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text !~ ALL ('{ab}'::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a ~ any ('{ab,bc}');
+ QUERY PLAN
+-------------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text ~ ANY ('{ab,bc}'::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text ~ ANY ('{ab,bc}'::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text ~ ANY ('{ab,bc}'::text[]))
+(7 rows)
+
+explain (costs off) select * from coercepart where a !~ all ('{ab,bc}');
+ QUERY PLAN
+--------------------------------------------------------
+ Append
+ -> Seq Scan on coercepart_ab
+ Filter: ((a)::text !~ ALL ('{ab,bc}'::text[]))
+ -> Seq Scan on coercepart_bc
+ Filter: ((a)::text !~ ALL ('{ab,bc}'::text[]))
+ -> Seq Scan on coercepart_cd
+ Filter: ((a)::text !~ ALL ('{ab,bc}'::text[]))
+(7 rows)
+
+drop table coercepart;
--
-- some more cases
--
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 6b7f57ab41..1464f4dcd9 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -152,6 +152,20 @@ explain (costs off) select * from boolpart where a is not true and a is not fals
explain (costs off) select * from boolpart where a is unknown;
explain (costs off) select * from boolpart where a is not unknown;
+-- test scalar-to-array operators
+create table coercepart (a varchar) partition by list (a);
+create table coercepart_ab partition of coercepart for values in ('ab');
+create table coercepart_bc partition of coercepart for values in ('bc');
+create table coercepart_cd partition of coercepart for values in ('cd');
+
+explain (costs off) select * from coercepart where a in ('ab', to_char(125, '999'));
+explain (costs off) select * from coercepart where a ~ any ('{ab}');
+explain (costs off) select * from coercepart where a !~ all ('{ab}');
+explain (costs off) select * from coercepart where a ~ any ('{ab,bc}');
+explain (costs off) select * from coercepart where a !~ all ('{ab,bc}');
+
+drop table coercepart;
+
--
-- some more cases
--
On Tue, May 08, 2018 at 07:05:46PM -0300, Alvaro Herrera wrote:
The reason for this crash is that gen_partprune_steps_internal() is
unable to generate any steps for the clause -- which is natural, since
the operator is not in a btree opclass. There are various callers
of gen_partprune_steps_internal that are aware that it could return an
empty set of steps, but this one in match_clause_to_partition_key for
the ScalarArrayOpExpr case was being a bit too optimistic.
Indeed.
While looking at this code, is there any reason to not make
gen_partprune_steps static? This is only used in partprune.c for now,
so the intention is to make it available for future patches?
--
Michael
On 2018/05/09 11:20, Michael Paquier wrote:
While looking at this code, is there any reason to not make
gen_partprune_steps static? This is only used in partprune.c for now,
so the intention is to make it available for future patches?
Yeah, making it static might be a good idea. I had made it externally
visible, because I was under the impression that the runtime pruning
related code would want to call it from elsewhere within the planner.
But, instead it introduced a make_partition_pruneinfo() which in turn
calls get_partprune_steps.
Thanks,
Amit
On 9 May 2018 at 14:29, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/05/09 11:20, Michael Paquier wrote:
While looking at this code, is there any reason to not make
gen_partprune_steps static? This is only used in partprune.c for now,
so the intention is to make it available for future patches?Yeah, making it static might be a good idea. I had made it externally
visible, because I was under the impression that the runtime pruning
related code would want to call it from elsewhere within the planner.
But, instead it introduced a make_partition_pruneinfo() which in turn
calls get_partprune_steps.
Yeah. Likely left over from when run-time pruning was generating the
steps during execution rather than during planning.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2018/05/09 11:31, David Rowley wrote:
On 9 May 2018 at 14:29, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/05/09 11:20, Michael Paquier wrote:
While looking at this code, is there any reason to not make
gen_partprune_steps static? This is only used in partprune.c for now,
so the intention is to make it available for future patches?Yeah, making it static might be a good idea. I had made it externally
visible, because I was under the impression that the runtime pruning
related code would want to call it from elsewhere within the planner.
But, instead it introduced a make_partition_pruneinfo() which in turn
calls get_partprune_steps.Yeah. Likely left over from when run-time pruning was generating the
steps during execution rather than during planning.
Here is a patch that does that.
Thanks,
Amit
Attachments:
make-gen_partprune_steps-static.patchtext/plain; charset=UTF-8; name=make-gen_partprune_steps-static.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index f954b92a6b..f1f7b2dea9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -116,6 +116,8 @@ typedef struct PruneStepResult
} PruneStepResult;
+static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
+ bool *contradictory);
static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
RelOptInfo *rel, List *clauses,
bool *contradictory);
@@ -355,7 +357,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
* If the clauses in the input list are contradictory or there is a
* pseudo-constant "false", *contradictory is set to true upon return.
*/
-List *
+static List *
gen_partprune_steps(RelOptInfo *rel, List *clauses, bool *contradictory)
{
GeneratePruningStepsContext context;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index c9fe95dc30..3d114b4c71 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -67,7 +67,5 @@ extern List *make_partition_pruneinfo(PlannerInfo *root, List *partition_rels,
extern Relids prune_append_rel_partitions(RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
-extern List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
- bool *contradictory);
#endif /* PARTPRUNE_H */
On Wed, May 09, 2018 at 02:01:26PM +0900, Amit Langote wrote:
On 2018/05/09 11:31, David Rowley wrote:
On 9 May 2018 at 14:29, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/05/09 11:20, Michael Paquier wrote:
While looking at this code, is there any reason to not make
gen_partprune_steps static? This is only used in partprune.c for now,
so the intention is to make it available for future patches?Yeah, making it static might be a good idea. I had made it externally
visible, because I was under the impression that the runtime pruning
related code would want to call it from elsewhere within the planner.
But, instead it introduced a make_partition_pruneinfo() which in turn
calls get_partprune_steps.Yeah. Likely left over from when run-time pruning was generating the
steps during execution rather than during planning.Here is a patch that does that.
Thanks, Amit.
Alvaro, could it be possible to consider as well the patch I posted
here?
/messages/by-id/20180424012042.GD1570@paquier.xyz
This removes a useless default clause in partprune.c and it got
forgotten in the crowd. Just attaching it again here, and it can just
be applied on top of the rest.
--
Michael
Attachments:
partprune-useless-default.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index f8844ef2eb..cbbb4c1827 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -2950,10 +2950,6 @@ perform_pruning_combine_step(PartitionPruneContext *context,
}
}
break;
-
- default:
- elog(ERROR, "invalid pruning combine op: %d",
- (int) cstep->combineOp);
}
return result;
Michael Paquier wrote:
Alvaro, could it be possible to consider as well the patch I posted
here?
/messages/by-id/20180424012042.GD1570@paquier.xyzThis removes a useless default clause in partprune.c and it got
forgotten in the crowd. Just attaching it again here, and it can just
be applied on top of the rest.
Done, thanks for insisting.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Amit Langote wrote:
On 2018/05/09 11:31, David Rowley wrote:
On 9 May 2018 at 14:29, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/05/09 11:20, Michael Paquier wrote:
While looking at this code, is there any reason to not make
gen_partprune_steps static? This is only used in partprune.c for now,
so the intention is to make it available for future patches?
Here is a patch that does that.
Pushed, thanks.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Marina Polyakova wrote:
Hello everyone in this thread!
I got a similar server crash as in [1] on the master branch since the commit
9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because
the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is
an ArrayCoerceExpr (see [2]):
Hello Marina, thanks for reporting this. I have pushed all fixes
derived from this report -- thanks to Amit and Micha�l for those.
I verified your test case no longer crashes. If you have more elaborate
test cases, please do try these too.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, May 09, 2018 at 10:39:07AM -0300, Alvaro Herrera wrote:
Michael Paquier wrote:
This removes a useless default clause in partprune.c and it got
forgotten in the crowd. Just attaching it again here, and it can just
be applied on top of the rest.Done, thanks for insisting.
Thanks!
--
Michael
On 2018/05/09 22:43, Alvaro Herrera wrote:
Amit Langote wrote:
On 2018/05/09 11:31, David Rowley wrote:
On 9 May 2018 at 14:29, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2018/05/09 11:20, Michael Paquier wrote:
While looking at this code, is there any reason to not make
gen_partprune_steps static? This is only used in partprune.c for now,
so the intention is to make it available for future patches?Here is a patch that does that.
Pushed, thanks.
Thank you.
Regards,
Amit
On 09-05-2018 17:30, Alvaro Herrera wrote:
Marina Polyakova wrote:
Hello everyone in this thread!
I got a similar server crash as in [1] on the master branch since the
commit
9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails
because
the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr,
but is
an ArrayCoerceExpr (see [2]):Hello Marina, thanks for reporting this. I have pushed all fixes
derived from this report -- thanks to Amit and Michaël for those.
I verified your test case no longer crashes. If you have more
elaborate
test cases, please do try these too.
Hello, thank you all very much! :)
--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company